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In This Issue 


Despite the best efforts by statistical agencies in counting people in a census, a small undercount 
always remains. These undercounts are usually not uniformly distributed over various subgroups 
of the population and therefore they impact differently on various government programs that 
use census population figures. Consequently, methods of measuring undercounts, adjustment 
techniques, especially for local areas, and related issues have attracted a great deal of attention 
from policy makers and statisticians. The six articles included in the special section on Census 
Undercount Measurement Methods and Issues will be a valuable addition to the growing literature 
on this topic. 

The first article in the section is a discussion paper by Freedman and Navidi. It reviews some 
of the statistical issues and arguments for and against adjusting the United States Census of 1980 
as well as discusses statistical evidence presented in a trial against the Department of Commerce 
and the U.S. Bureau of the Census. The article is a continuation of the discussion between the 
authors and Ericksen, Kadane and Tukey who are proposing methodology for the adjustment. 
It also shows how some of the conflicting views were resolved by the trial court. The article is 
followed by very insightful and lively comments from several statisticians and a reply from the 
authors. 

Cressie presents an empirical Bayes approach to prediction of undercount at subnational levels 
based on restricted maximum likelihood (REML). The claimed advantage of the REML 
estimators is that they do not tend to oversmooth the post-enumeration survey data as maximum 
likelihood estimation does. The REML estimators are compared with the maximum likelihood 
and method-of-moments estimators by simulation and example. 

Prior to the 1990 U.S. Census, a dress rehearsal took place in the state of Missouri. Datta 
et al. use the data from this exercise to study procedures for modelling from census post- 
enumeration surveys. They consider both hierarchical and empirical Bayes approaches. The 
results indicate that both approaches lead to improvements on the dual system estimation 
approach. The authors conclude with an update in light of the adjustment of the actual 1990 
U.S. Census. 

Four estimators of the base population used as a benchmark in the Population Estimates 
Program of Statistics Canada are discussed by Royce. These are the unadjusted census counts, 
adjusted census counts, a preliminary test estimator and a composite estimator. The Weighted 
Mean Square Error is used as the basis for comparison of these estimators, not only for estimation 
of population totals but also for estimation of functions of population totals, such as population 
shares or growth rates etc. 

Swain et al. give an overview of the Address Register that was created at Statistics Canada 
as a means of reducing undercoverage in the 1991 Census of Canada and represents a frame 
of the residential addresses for medium and large urban centres. Methodology, post-censal 
evaluation and future prospects are discussed. 

The final article of the special section, by Fienberg, presents a selected annotated bibliography 
of the literature on capture-recapture estimation of population size. Capture-recapture estimation 
is the main method used to evaluate the completeness of the census counts and thus the article 
concentrates on literature related to the estimation of human populations. 

Roe, Carlson and Swanson describe a variation of the Housing Unit Method to estimate the 
population of small rural areas. In this variation, local experts provide data about selected 
households. The estimates are compared to the census counts for three rural communities. 


2 In This Issue 


Xia et al. compare the statistical properties and costs of telescopic single stage cluster sampling 
with that of ordinary single stage cluster sampling. Telescopic single stage cluster sampling is 
an alternative when sub-sampling of clusters (/.e. two-stage cluster sampling) is not possible. The 
method has been used in the Shangai Survey of Alzheimer’s Disease and Dementia, which serves 
as an illustration of how costs can be reduced without sacrificing precision. 


The Editor 


Survey Methodology, June 1992 3 
Vol. 18, No. 1, pp. 3-74 
Statistics Canada 


Should We Have Adjusted the U.S. Census of 1980? 
D.A. FREEDMAN and W.C. NAVIDI! 


ABSTRACT 


This paper reviews some of the arguments for and against adjusting the U.S. census of 1980, and the 
decision of the court. 


KEY WORDS: Census; Adjustment; Post Enumeration Survey; Regression; Smoothing. 


1. INTRODUCTION 


Every ten years, the census gives a statistical portrait of the United States. Geographical 
detail makes these data unique. However, the counts have more than academic interest: they 
influence the distribution of power and money. The census is used to apportion Congress as 
well as local legislatures and to allocate tax money - $40 billion per year in the late 1980s - to 
39,000 state and local governments. For these purposes, the geographical distribution of the 
population matters, rather than counts for the nation as a whole. Indeed, the census is used 
as a basis for sharing out fixed resources: if one jurisdiction gets more, another must receive 
less. Adjusting the census is advisable only if the process brings us closer to a true picture of 
the distribution of the population. 

A small undercount is thought to remain in the census, and this undercount is unlikely to 
be uniform. People who move at census time are hard to count; in rural areas, maps and address 
lists are incomplete. Central cities have heavy concentrations of poor and minority persons, 
who may be harder to enumerate. If the undercount can be estimated with good accuracy, 
especially at the local level, adjustments can - and should - be made to improve the census. 
Some statisticians argue that the undercount can be estimated well enough, others are skeptical: 
a bad adjustment may be worse than nothing. 

Because of its resource implications, the undercount has attracted considerable attention 
in the media, the Congress, and the courts. After the 1980 census, New York City joined with 
other jurisdictions to sue the Department of Commerce, seeking to compel an adjustment based 
on demographic analysis and capture-recapture techniques. The Commerce Department resisted 
this pressure. The trial court framed the issue as follows: 


‘*The plaintiffs contend that a statistical adjustment of the census will improve upon the 
accuracy of the census, thereby reducing the disproportionate undercount in the City and 
State [of New York]. The Census Bureau, however, contends that although the census 
counts are imperfect, a statistical adjustment of the census will inject even greater inaccu- 
racies into the population count, and that therefore, a statistical adjustment of the census 
is not technically feasible or warranted at this time.’’ (674 F Supp 1091 = volume 674 
of the Federal Supplement, page 1091). 


''D.A. Freedman, Statistics Department, University of California, Berkeley, CA U.S.A. 94720; W.C. Navidi, 
Mathematics Department, University of Southern California, Los Angeles, CA U.S.A. 90089. 
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The 1980 case may seem dated, given that the census of 1990 has already been taken. 
However, among law suits that involve statistical principles, the 1980 census case was one of 
the most important and closely argued; there is still much to learn from it. This article will 
review some of the technical issues, and some of the findings of the court. 

The balance of this section will sketch the background; for more details, see Cohen and Citro 
(1985) or Fay et al. (1988). There are two methods for evaluating the completeness of the 
counts in the U.S. Census: demographic analysis and capture-recapture. Demographic 
analysis uses administrative records (birth certificates, death certificates, immigration visas, 
etc.) to make independent estimates of population totals. The starting point is an accounting 
identity: 


Population = Births - Deaths + Immigration - Emigration. 


Demographic analysis provides estimates by age, sex and race but not ethnicity, because 
of gaps in the records. Data on immigration and emigration are incomplete; birth records are 
incomplete too, especially prior to 1935. Thus, the data going into the ‘‘identity’’ must be 
supplemented by a variety of imputations and adjustments. Furthermore, data on internal 
migration are lacking, so estimates are made primarily at the national level. This completes 
our sketch of demographic analysis. 

Estimates of coverage for small areas (including states and cities) are based on capture- 
recapture techniques. Capture is in the census; recapture is in a sample survey conducted after 
the census. In 1980, there were two such surveys, or ‘‘P-samples:’’ the April and August CPS 
(Current Population Survey). Each record from the P-samples was matched against the census 
file to see if the corresponding person was ‘‘captured,’’ that is, counted in the census. Records 
that could not be matched indicated people who were missed by the census - or a failure in 
the matching process. These data were used to estimate the percentage of persons missed by 
the census, that is, the rate of omissions. 

The census also had a small percentage of erroneous enumerations (for instance, people 
counted at two different addresses); the number was estimated by taking an ‘‘£-sample’’ of 
census records and trying to check them by field work. In effect, the net undercount was 
estimated by taking the difference between the omissions and erroneous enumerations. (For 
details, see Fay et al., Chapter 5.) These undercount estimates were made as part of ‘‘PEP,”’ 
the Post Enumeration Program. 

In 1980, there was a fair amount of missing data in the P- and E-samples: for instance, there 
was a 4% non-interview rate in the CPS; even after interview, a determination of match status 
could not be made for another 4% of the subjects. To see the effect of missing data, a variety 
of imputation schemes were considered, leading to 12 different series of PEP estimates for 
66 subareas. 

The 66 areas covered the whole U.S. They included cities like New York; states apart from 
these cities, like upstate New York; and whole states like Wyoming. A PEP ‘‘series’’ consists 
of 66 estimates, one for each study area; 9 of the 12 series were based on the April CPS, and 
3 on the August CPS. 

In the 1980 case, expert witnesses for plaintiffs included Gene Ericksen, Jay Kadane, and 
John Tukey. Their strategy for adjusting the census using PEP data was described in Ericksen 
and Kadane (1985). Freedman (among other statisticians and demographers) testified for the 
defendants, and Navidi was a consultant. A critique of the proposed adjustments was 
summarized in Freedman and Navidi (1986), to be referenced here as FN. 
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We now indicate some of the technical issues. According to experts from the Bureau of 
the Census: 


(a) There were substantial differences among the 12 PEP series, demonstrating that missing 
data were a serious problem. 

(b) The PEP estimates were subject to large biases, apart from the problems created by missing 
data. 

(c) Each PEP series was subject to unacceptably large sampling error. 

Ericksen and Kadane responded that one of the PEP series (‘‘PEP 2-9’) was preferred, and 
that sampling error could be substantially reduced by regression modeling. They proposed a 
model with two equations. The first equation expresses the idea that y;, the PEP estimate for 
study area i, is an unbiased estimate of the true undercount y; for that study area. Informally, 


PEP estimate for areai = True undercount in areai + Random error. 


Formally, 
Vi= Vi t 4; (1) 


The second equation expresses a theory about the variation of the undercounts from area 
to area, in terms of a vector of explanatory variables X; and a vector of hyper-parameters @. 


Informally, 


Linear combination of 
True undercount — explanatory variables + Random 


in area i for area i error. 


Formally, 


Ver ok jan Dak +6; (2) 


The assumptions on the error terms can be stated as follows: 


E(6;)'="E (eG) = 0. (3) 
varé; = K;, vare; = 0°. (4) 
61, 62, .--+» 665 €15 €2) » ++» €66 are independent. (5) 
6; and e¢; are normally distributed. (6) 


In (4), K; is the split-sample variance for y; computed by the Bureau; randomness in K; is 
ignored; o* does not depend on i and is treated as constant even though it is estimated from 
the data. The role of assumptions, and departures from them, was examined in FN; also see 
the discussion papers and rejoinder, as well as sections 6-7 below. 

The Ericksen-Kadane model was used in the 1980 case to smooth the PEP estimates, with 
the objective of reducing sampling error. The main focus of FN was a critique of that model. 
Ericksen, Kadane and Tukey (1989) - to be referenced here as EKT - replied to FN, and the 
present paper continues the exchange. 

EKT cited a paper by Schirm and Preston (1987), which considers adjusting states and the 
District of Columbia by the ‘‘synthetic method.’’ For instance, demographic analysis (with 
one set of assumptions on illegal immigration) estimated a national undercount rate of 5.9% 
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for blacks and 0.7% for whites in 1980. The synthetic method adjusts each state as follows: 
increase the number of blacks by 5.9% and the number of whites by 0.7%. In short, under- 
count rates are assumed to depend on race but not geographical area - or anything else. 

This completes our summary of the technical background. For an update on the 1990 census, 
see Freedman (1991); some of the introductory material here was excerpted with minor changes 
from that paper. For other views, see Hogan and Wolter (1988), Schirm (1991), Wolter (1991), 
Wolter and Causey (1991), or Ericksen, Estrada, Tukey and Wolter (1991). The balance of 
the present paper responds to the salient points raised by EKT, and indicates how some of the 
the conflicting views were resolved by the trial court. 


2. DO THE ADJUSTMENTS IMPROVE ON THE CENSUS? 


The most important question is whether adjustments improve on the census counts. EKT 
. are confident of improving upon the raw census count (p. 943)’’; indeed, there are 


n 
n 


‘two simple [synthetic] adjustments that improve upon the census ... the question of 
the Ericksen and Kadane model is not whether it proves that adjustment is feasible, but 
whether it improves upon the simpler methods (pp. 927-8) ... Study of the method will 
not ‘‘prove’’ that an adjustment will improve the census. This has already been 
demonstrated by Schirm and Preston and the results of Tables 5 and 6 (p. 933).”’ 


Thus, EKT’s Tables 5 and 6 are the main pieces of empirical evidence to show that adjustment 
will improve on the census. And Table 6 on erroneous enumerations is redundant, because 
the PEP estimates in Table 5 include the effect of erroneous enumerations. Table S is the critical 
one, and it is reproduced here for ease of reference. In our opinion, the table says very little 
about the possibility of improving on the census; to see why, some numerical detail is needed. 
(Schirm and Preston will be discussed in the next section.) 

‘“Group 1’’ in the table consists of 16 central cities; ‘‘group 2’’ consists of other study areas 
that have relatively high minority populations; ‘‘group 3’’ consists of study areas with small 
minority populations. At best, the table shows that several methods for adjusting these groups 
are in general agreement. The table does not show that any of the methods improve on the 
accuracy of the census. It cannot, because there is no external standard against which to measure 
improvement. 

Moreover, we believe the impression of agreement in the table to be largely illusory. There 
are dramatic differences among EKT’s preferred PEP series, or between these series and the 
synthetic adjustment of Schirm and Preston. Of course, drama depends on scale, and our next 
task is choosing units. Proponents of adjustment often use ‘‘loss functions’’ to make their 
argument; squared error is a common choice: see Ericksen, Estrada, Tukey and Wolter 
(1991, p. 20). EKT view Schirm and Preston as demonstrating census adjustment to be 
advantageous, so we compute the root mean square difference between the census and the 
‘‘Synthetic B’’ line in Table 1, which is based on the Schirm and Preston adjustment. (The 
mean is weighted by population shares.) 


Vill x (.12)? + .44 x (.06)? + .45 x (.18)? = 0.13 of 1%. 


In short, 


rms difference between census and synthetic B = 0.13 of 1%. (7) 
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Table 1 


EKT’s Table 5. Changes in National Population Shares Resulting When Counts are 
Adjusted by Sample Estimates Pooled Across Areas and Synthetic Estimates. 
[The entries for the three groups represent changes in shares, or differential 
undercounts; the entries in the last column represent total undercounts. ] 
= ns Abas ll Biel Ss Se ee a ee ee ee 


Estimated 
PEP estimate Group 1 Group 2 Group 3 ie aaa t 
rate 

2-20 + 52% + .09% — .61% + 1.9% 
3-20 + .51% + .08% — 59% +1.7% 
2-9 + .50% + .06% — 56% + 1.6% 
3-9 + 49% + .04% — 53% +1.4% 
2-8 + .41% + .04% — 45% +1.1% 
3-8 + .39% + .03% — .42% + 1.0% 
5-9 + .31% + .25% — 56% +2.1% 
5-8 + 22% + .23% — 45% +1.7% 
14-20 + .21% + .02% — .23% — 2% 
10-8 + 19% + .07% — .26% + 3% 
14-9 + .19% — .01% — .18% — .5% 
14-8 + .10% — .03% — .07% — 1.0% 
Synthetic A + .17% + .14% . — .31% +1.4% 
Synthetic B + 12% + .06% — .18% +1.4% 


Shares of Census Count 10.76% 44.24% 45.00% 


Notes: (i) Group 1 includes 16 central cities. Group 2 includes three state remainders (California, Maryland, and 
Texas, excluding Group 1 cities) and 17 whole states. All areas are at least 10% Black or Hispanic. Group 
3 includes nine state remainders and 21 whole states. All Group 3 areas are less than 10% Black or Hispanic. 

(ii) The Synthetic A estimates assume that (a) Blacks have the same undercount rates as Hispanics, 5.9%; 
(b) the undercount rate of persons neither Black nor Hispanic is 0.3%; (c) the undercount rates for Blacks, 
Hispanics, and all others are invariant across geographic areas; and (d) there are 3 million undocumented 
aliens, 9.6% of whom are Black. 

(iii) Following Schirm and Preston (1987), the Synthetic B estimates assume that (a) the Black undercount rate 
is 5.9%; (b) Hispanics and other non-Blacks have an undercount rate of .7%; (c) the undercount rates 
for Blacks, Hispanics, and all others are invariant across geographic areas; and (d) there are 3 million 
undocumented aliens, 9.6% of whom are Black. 


EKT prefer the first 8 of the PEP series (pp. 933 and 938). We next compute the rms 
difference between PEP 2-20 and 3-8, which are among EKT’s preferred series. (PEP 2-20 and 
3-8 were both based on the April CPS; differences between them are due only to procedures 
for handling missing data.) 


rms difference between PEP 2-20 and 3-8 = 0.14 of 1%. (8) 


EKT also recommend averaging as a way of eliminating indeterminacies (pp. 931 and 937). 
Table 2 compares population shares from the census, the synthetic B estimates, and the average 
preferred PEP estimates. We take the rms difference between the average preferred PEP and 
synthetic B: 


rms difference between average preferred PEP and synthetic B = 0.25 0f 1%. (9) 
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Table 2 


Population Shares from the Census, the Synthetic B Estimates, 
and the Average of EKT’s Eight Preferred PEP Series 
(2-20, 3-20, 2-9, 3-9, 2-8, 3-8, 5-9, 5-8). 


Group | Group 2 Group 3 Total 
Average Preferred PEP - Synthetic B 30% .40% — 34% .00% 
Census - Synthetic B — 12% — .06% + 18% .00% 
Average Preferred PEP 11.18% 44.34% 44.48% 100.00% 
Synthetic B 10.88% 44.30% 44.82% 100.00% 


Census 10.76% 44.24% 45.00% 100.00% 


A comparison of (7), (8) and (9) reveals three salient points: 


(a) the difference between the census and synthetic B is rather small; 

(b) the range in the preferred PEP series is larger than the difference between the census and 
synthetic B; 

(c) the difference between the average preferred PEP and synthetic B is twice the difference 
between the census and synthetic B. 


EKT must view a difference of 0.13% as serious: see (7). On this scale, the PEP series do 
not agree among themselves. Furthermore, the PEP series are very different from the synthetic 
adjustment. Of course, the reason may be that Schirm and Preston did not go far enough. How- 
ever, a National Academy of Sciences review panel - with Jay Kadane as a prominent member - 
reached the tentative conclusion that Schirm and Preston already over-adjusted the census: 
see Cohen and Citro (1985, p. 287). 

The PEP estimates are in better agreement with the ‘‘synthetic A’’ adjustment in Table 1. 
But this is circular: the undercount rate for hispanics in synthetic A was estimated from PEP, 
while synthetic B was based on demographic analysis. Differences among the PEP estimates 
are an awkward reality; and so are differences between the PEP estimates and synthetic 
adjustments. 

We now quote the principal claim made by EKT (p. 927): 


‘‘Our conclusion is that regardless of whether we use one of the simple methods or the 
composite method and regardless of how we vary the assumptions of the composite 
method, an adjustment reliably reduces population shares in states with few minorities 
and increases the shares of large cities.’’ 


Giving more money to cities by changing the census counts is a good idea only if the adjust- 
ment reliably improves the accuracy of the census. Accuracy is the crucial issue, and we wish 
EKT would address it more directly. Their Table 5 is almost irrelevant. 


3. SCHIRM AND PRESTON 


Can synthetic adjustment reliably improve on the accuracy of the census? EKT think so, 
citing Schirm and Preston (1987) for the evidence. Schirm and Preston present two major 
arguments, one analytic and one based on simulation. However, both have serious flaws. 
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Table 3 


A Counter-example to the Analytical Argument. 
There are Two States and Two Races. 
EE a a as SR aaa eS ee 


White Black Total 
Census True Census True Census True 
count count count count count count 
State A 90 89 1 Dp 91 91 
State B 910 890 99 119 1,009 1,009 
Total 1,000 979 100 121 1,100 1,100 


The analytic argument (p. 966): 


“‘Our finding is that synthetic adjustment will always move the estimated ratio of a state’s 

population to the national population closer to the true ratio if: 

(a) the state’s black undercount is closer to the national black undercount than it is to 
the national undercount for both races combined and 

(b) the state’s white undercount is closer to the national white undercount than it is to 
the national undercount for both races combined.’’ 


As a matter of mathematics, this proposition is wrong. A counter-example is given in Table 3: 
state A, for instance, has by construction 89 whites and a census count of 90. 

The counter-example has been set up to make the arithmetic easy; more complicated and 
realistic examples could undoubtedly be provided. In Table 3, the overall error in the census 
(white plus black) is 0, for each state and for the nation. Thus, the census gets the state shares 
right, and any adjustment will make matters worse. Error rates (with the true population as 
base) are shown in Table 4: Schirm and Preston’s conditions are satisfied. Synthetic adjust- 
ment moves both states farther from truth, as shown in Table 5; state B is helped, state A is 
hurt. To compute Table 5 from Table 3, the number of whites in state A is multiplied by: 


true national total for whites/national census total = 979/1,000. (10) 


The arithmetic for the other cells is similar. 


The counter-example may be informative, as a parable: state A is sparsely populated, with 
a small minority population; state B is heavily populated, and has a large, hard-to-count 
minority population. Synthetic adjustment may favor states of type B at the expense of type A. 
The mathematical error in Schirm and Preston’s appendix appears to be in their reasoning from 
display A.2. Professor Preston informs us (personal communication) that the theorem holds, 
with a more complicated set of conditions involving weighted averages. 


Table 4 


Undercounts from Table 3, in Percent. 
(Negative undercounts correspond to overcounts.) 


White Black Total 
State A —1.1% 50% 0% 


State B —2.2% 17% 0% 
Total —2.1% 17% 0% 
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Table 5 
The Synthetic Adjustment, ‘‘Syn’’. 
White Black Total 
True True True 
Syn count Syn count Syn count 
State A 88 89 1 2 89 9] 
State B 891 890 120 119 1,011 1,009 


Total 979 979 121 121 1,100 1,100 


This completes our discussion of the analytic reasoning in Schirm and Preston. What about 
the simulation results? Basically, Schirm and Preston consider 51 areas (the states and D.C.) and 
two races (black and white). They set up a joint distribution for an assumed “‘true’’ population 
and the census counts; both are taken as stochastic. The census counts can be adjusted by the 
synthetic method, and the question is whether the raw counts or the adjusted counts are closer 
to the assumed true counts. Schirm and Preston actually consider several joint distributions, 
defined by different ‘‘scenarios,’’ that is, choices of parameters; the results are quite similar 
across scenarios. They also consider several loss functions, or measures of closeness. 


We focus on Scenario I, and make two brief comments. 


(a) The claimed improvement is rather modest. For example, on average, just over half the 
population lives in states whose shares are made more accurate by adjustment - no matter 
how small the improvement. 

(b) The ‘‘true’’ population was constructed on the basis of the synthetic assumption - no 
systematic variation in undercount rates within race across geography; random variation 
was allowed. See equation (2) in Schirm and Preston. Thus, the definition of ‘‘truth’’ favors 
synthetic adjustment. 


On the whole, however, Schirm and Preston have a reasonable argument. If the assumptions 
of the synthetic method more or less hold, its estimates will be good. There remains the crucial 
question: do those assumptions hold? what kind of geographical variation is there in undercount 
rates? On this score, Schirm and Preston offer no evidence. In the 1980 case, the trial court 
found that ‘‘the synthetic method simply ignores geographical variations and assumes that a 
person is as likely to be missed in the census whether he lives in Alabama or in Alaska. However, 
as defendants’ experts persuasively explained, this assumption that the undercount rates for 
the various age, race, and sex groups are constant from one subnational area to another has 
no basis in fact whatsoever ... the synthetic method is simply inadequate as a means of 
adjusting the census.’’ (674 F Supp 1098, footnotes and citations omitted). 


4. ADJUSTING SMALL AREAS 


Statistical adjustment of census counts is more likely to be beneficial at fairly high levels 
of geographical aggregation (for instance, census regions or divisions). However, there are 
39,000 state and local governments in the U.S., all claimants for tax money. Many of these 
jurisdictions are further subdivided, into city council seats, etc. If census counts are to be 
adjusted, they must for legal and policy reasons be adjusted at quite fine levels of geographical 
detail. Indeed, the proposal for 1990 is to adjust down to the block level. (A ‘‘block’’ is the 
smallest unit of census geography; there are 6.5 million blocks in the U.S.). 
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EKT discuss two synthetic methods for adjusting subareas of the 66 study areas, as well 
as a regression method (p. 941). In the end, however, there is no evidence that adjustment of 
small areas will improve on the raw census counts. With respect to 1980, EKT say (p. 943): 


“For the 66 areas included in our study, we are confident of improving upon the raw 
census count, especially in those areas with large undercounts or overcounts where an 
adjustment is most needed. Our findings do not permit definitive conclusions for 
suburban areas, for central cities other than the 16 included in our data set, or for other 
rural or urban parts of individual states. To compute estimates for such areas, we would 
prefer not to extrapolate from the regression equations presented in this article.’’ 


EKT go on to describe alternative designs for capture-recapture sampling, leaving open the 
question of small-area adjustment for 1990. Much of the dispute in 1980 centered on the 
feasibility of adjusting small sub-areas of the 66 study areas. To win its case, New York had 
to show such adjustments would improve on the census. EKT now seem to concede there was 
little evidence on this score. 


5S. AVERAGING AND SENSITIVITY ANALYSIS 


The 12 PEP series were the results of a sensitivity analysis on missing data. Since the amount 
of missing data was large relative to the undercount, methods for handling missing data have 
impact. In response, EKT offer quite a variety of procedures for adjusting the census on 
the basis of the various PEP series, including: (a) eliminating discrepant series (pp. 937-9); 
(b) eliminating systematic differences between the series (pp. 937-8); (c) regression on other 
variables (the ‘‘composite’’ estimator, pp. 933ff); (d) averaging (pp. 931 and 937). 

This list makes clear the essential indeterminacy of census adjustment schemes. And in this 
context, the use of averages to reduce indeterminacy needs discussion. Arbitrary modeling 
decisions may be defensible if they do not matter - the usual robustness argument. Sensitivity 
analysis (changing the assumptions to see if the results change) may refute the robustness 
argument. However, averaging the results from a sensitivity analysis is self-defeating. The 
different PEP series are not repeated measurements of the undercount. It is the spread in the 
PEP series that is interesting, not the average—becausce it is the spread (among, say, the April 
series) that demonstrates the impact of different modeling assumptions on the same data. 


6. ASSUMPTIONS 


EKT (p. 937) say the model improves on the PEP estimates and the synthetic method. The 
model does improve on the PEP estimates, if you grant its assumptions-equations (1) through 
(6) above. So far, however, these equations still seem quite implausible. Likewise, the model 
improves on the synthetic estimates only if it uses the additional variables in a sensible way, 
bringing us right back to assumptions. 

At times, EKT seem to argue that the model can be inferred from the data (pp. 933ff). Of 
course, there is more to a regression model than choice of variables on the left hand side and 
the right hand side, although that is difficult enough, as will be seen below. There are many 
questions to answer: Why are effects linear and additive - equations (1) and (2) above? What 
about the assumptions on the errors - equations (3) through (6)? And so forth. EKT put for- 
ward no evidence to justify their assumptions, except by attempting to rebut our rebuttal 
(p. 931). Do they think a model is right unless it can be proved wrong? 
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In any case, we stand by our critique. For some data on correlation bias, see Fay et al. 
(1988, esp. sec. 6F); for a critique of Ericksen and Kadane’s estimates, see Fellegi (1985, p.118). 
Other sources of bias in the PEP series include matching errors and errors in census-day address 
reports. 

EKT argue that PEP is ‘‘conservative’’ (p. 931). This seems to be both wrong and irrelevant; 
wrong because the biases generally increase the apparent undercount: and irrelevant because 
geographical variation in the biases matters a great deal. Assumption (3) is rather unlikely: 
the errors probably do not have mean 0. The undercounts estimated by PEP are likely to be 
biased upward, the size of the bias depending on the area. For a review of the evidence, see 
Fay et al., chap. 6; also see FN. The trial court in the 1980 case concluded: 


‘*The evidence at trial established that the PEP was plagued by various errors caused by 
inadequacies in the PEP methodology. This type of error is referred to as ‘bias.’ A 
significant source of bias in the PEP arises because the process of matching people from 
the CPS to the census ... is an extraordinarily difficult and inexact task. Because of 
inaccurate, irregular, and incomplete information in both the CPS and the census, the 
Bureau undoubtedly and inevitably made many errors in determining the match status 
of individuals enumerated in the CPS, thereby distorting the P-sample’s undercount 
estimate. Moreover, the evidence at trial established that most of this matching error 
occurred because the Bureau erroneously determined many cases to be misses when they 
were in fact matches. This error, therefore, resulted in the PEP overstating the under- 
count. The extent of this error and the degree to which it varies from one geographic area 
to another is unknown.’’ (674 F Supp 1100, footnotes and citations omitted). 


We turn now to equations (4) and (5). Take the independence assumption. In 1980, there 
were 3 processing offices and 12 regional offices. EKT’s counter: there were 400 district offices. 
Granted. There were also several dozen area managers, several hundred thousand census 
staff and about 1,500 CPS interviewers. The sources of error are numerous, and dependence 
seems likely. Processing offices, regional offices, managers, census interviewers, and CPS 
interviewers all must contribute components of error, to say nothing of respondents. Likewise, 
the constancy of o” in (4) seems unlikely: different parts of the country are undercounted for 
different reasons, not readily captured in a linear regression equation. 

We pointed out that random events like snowstorms might cause correlated errors in several 
areas; EKT respond that there were no snowstorms. This issue goes to the foundations of 
statistics: if the weather is good, the errors are independent; but in foul weather, all bets are 
off. The distributions in the model, and the statistical inferences, are therefore conditional 
on certain events. Which ones, and why? 

Fortunately, we do not need to resolve the problem of conditional vs. unconditional inference. 
There was a major event that disrupted census operations over several states in the Pacific Northwest. 
Mt. St. Helens erupted in May 1980, while follow-up interviewing was in full swing. 


7. OTHER ISSUES 


7.1 Does it Matter which Series is Used? 


At the level of precision EKT demand of the census, the different PEP series - even among 
their preferred ones - really do lead to quite different adjustments, as shown by equations (7) 
through (9). EKT, however, claim that the preferred PEP series all lead to similar adjustments. 
And to support their position they offer Table 11, which suggests for example that New York 
City has a differential undercount of 3.27% with an uncertainty of 0.62%. 
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For many purposes, a uniform undercount would not be material; it is differential under- 
counts that create inequities. The ‘‘area effects’’ seem to be measures of differential undercount - 
the policy variable of main interest. 


The ‘‘area effects’’ in the table were computed by EKT as follows: 


(i) Restrict attention to 8 of the 12 PEP series. 
(ii) Smooth each of these using the regression model. 
(iii) For each area, take the average of the 8 estimates. 
(iv) Subtract the corresponding national estimate of undercount. 


Table 6 below compares ‘‘area effects’’ with differences in the PEP estimates, attention being 
restricted to the preferred series based on the April CPS. Differences among these PEP estimates 
are due only to differences in the handling of missing data. Taking the range seems fair: reasons 
for data to be missing can differ from area to area, and so will the appropriate imputation 
procedure. Adding in the August series would increase the range, but some of the difference 
would be due to sampling error. 

The table shows that for some areas, the effects are large relative to differences between 
PEP series, suggesting that missing data have little impact on the results. Upstate New York 
is an example. But for other areas, like Chicago, the reverse holds and imputation procedures 
matter. 

All 66 areas are plotted in Figure 1. The x-axis shows the area effect; the y-axis shows 
the range in the preferred April PEP series. In root mean square (across the 66 areas), the 
spread among EKT’s preferred PEP series - based on the April CPS - is about 75% of the 
area effect. In other words, the impact of missing data (never mind other biases in PEP) is 
similar in magnitude to the effect EKT are trying to measure. Bringing in alternative imputa- 
tion models would make matters even worse. Nor is averaging the results a good fix, for reasons 
given earlier. 


Table 6 


Comparing Area Effects with Differences in the PEP Estimates, 
Restricted to Preferred Series Based on the April CPS. 
Subareas Match those used in FN. 


Preferred April PEP series 


FESEEL fas We a ae SO Ee a ee ee Area 

Min. Max. Range effect 

Alabama — .37 .60 Ou — 1.07 
Alaska 2279 3.53 .74 1k63 
Los Angeles 4.56 lee 3).16 3.16 
San Diego — 98 1.45 2.43 .65 
San Francisco 4.31 6.25 1.94 one 
Rest of California 2.84 S592 1.08 03 
Chicago See 6.56 2.99 bei 
Rest of Illinois IPN ee 4 — 1.04 
New York City 6.04 7.90 1.86 ac21 
Rest of New York — 1.61 —1.44 ay —2.55 


Wyoming 3,91 4.04 al ye 1.16 


14 Freedman and Navidi: Should We Have Adjusted the Census of 1980? 


Range in PEP Series 
4 


Area Effect (EKT Table 11) 


Figure 1. PEP and data quality. For each of the 66 study areas, the horizontat axis shows the EKT 
‘‘area effect.’’ The vertical axis shows the range in the preferred April PEP series. 


The positive association in Figure 1 is quite striking, and so is the change in the joint distri- 
bution when the area effect changes from negative to positive. Our explanation: PEP estimates 
of undercount are indicators of poor data quality - in PEP as well as the census. Large apparent 
undercounts indicate areas with poor data. In such areas, there is a lot of missing data, so the 
effect of changing the imputation rules will be large too. Areas that are hard to count are also 
hard to adjust. See FN p. 9 or Wolter (1986, p. 26, points 8 and 9). 

There may be some reasonable way of choosing a compromise version among the PEP series. 
But why are any of the PEP series, or their averages, an improvement over the census? That 
is the crucial question, and EKT do not answer it. In our view, adjustment - whether by a 
synthetic method, or a PEP series, or a regression model, or any convex combination - will 
in the end be driven mainly by assumptions. 


7.2 Which PEP Series is Best, and which Explanatory Variables should be Used? 


At trial, and in their discussion of FN, Ericksen and Kadane recommended an adjustment 
based on PEP 2-9, apparently the most preferred of all 12 series. We chose PEP 10-8 as an 
alternative for study. EKT defend 2-9, and try to exclude 4 of the series - especially our foil 
10-8. The arguments were reviewed in court and in FN (p. 8, the discussion, and the rejoinder 
p. 36). Our opinion remains the same: there is no rational basis for choosing 2-9 over 10-8. 

EKT impute to us the position that ‘‘proportion urban’’ should have been considered as an 
independent variable (p. 934). This is not quite right. We felt that EKT’s choice of independent 
variables was somewhat arbitrary, and wanted to show that changing variables made a real 
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difference to the results - another sensitivity analysis. The difference was observed mainly for 
small areas (FN, p. 9). Since EKT no longer advocate adjustment of small areas in 1980, this 
argument may be moot. 

There is one new twist to the reasoning: EKT argue for choosing models by ‘‘reliance on 
statistical criteria (p. 941).’’ In essence, they recommend choosing variables so as to minimize 
the rms residual in an OLS fit. However, the rms residual measures association in the data 
not correctness of underlying theory. 

For reasons that remain unclear, EKT restrict attention to models with 2, 3, or 4 variables; 
and they require coefficients to have f-statistics of 2 or more. Their preferred equation seems 
to be: 


PEP 2-9 = —2.23 + .079 min + .036 crime + .028 conv + residual (11) 
(— 4.0) (5.4) (3.6) (3.5) 


rms residual ‘="1.53. 


The right hand side variables are the percent minority in the study area, the crime rate, and 
the percent conventionally enumerated; f-statistics are shown in parentheses; the rms residual 
is computed using the unbiased divisor n - p. This equation is used only for variable selec- 
tion; after the variables are chosen, the model is refitted by GLS: see (1-6) above, and FN for 
discussion. ; 

The statistical logic is not apparent, and EKT’s criteria have to be read quite literally. For 
example, here is another candidate equation: 


PEP 2-9 = .120 min + .026 crime + .029 conv — .176 pov + residual (12) 
(7.6) (3.4) (3.8) (— 4.4) 


rms residual = 1.49. 


The additional variable is the percentage of persons in the study area with incomes below 
the poverty level; the intercept was suppressed because the f-statistic was small. Equation (12) 
fits a little better than (11) in terms of rms residual, and ‘‘shows’’ that the undercount goes 
down as the percentage of poor people goes up - other things being equal. EKT reject this equa- 
tion because the coefficient of ‘‘pov’’ is significantly negative rather than significantly positive. 

Preconceptions about the undercount may be incompatible with the data, and best-subsets 
OLS may not be a suitable analytic technique. We reject neither interpretation, but our main 
conclusion is this. In the present context there are no objective, statistically defensible criteria 
for model selection. Much rides on the subjective judgment of the modeler. 

With this in mind, we return to the points at issue - choosing a PEP series, and deciding 
between the crime rate or the percent urban as explanatory variables. As far as we can see, 
on the criteria chosen by EKT, the difference between crime rate and percent urban is trivial. 
And PEP 10-8 is clearly better than 2-9. See Table 7. 

On pages 935 and 940 of EKT, o denotes the rms residual. There is some conflict in nota- 
tion, because we wrote o” for Var (e) in equations (2) and (4), following Ericksen and Kadane 
(1985, p. 105) or FN (p. 5). To avoid conflict, let SE(¢) be the estimated value for our og; this 
is what controls the standard errors of the 66 area undercounts computed by the Ericksen- 
Kadane model, as shown by equations (8) and (10) in FN. For PEP 10-8, the estimated SE (e) 
is virtually 0, so a model based on 10-8 fits extremely well and the 66 area undercounts are 
very precisely estimated (Table 8). 
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Table 7 


RMS Residuals from Regression Equations for PEP 2-9 and PEP 10-8. 
Explanatory Variables Include Percent Minority, 
Percent Conventionally Enumerated, 
and Either the Crime Rate or the Percent Urban. 


Crime Percent 
rate urban 
PEP 2-9 1.55 1.54 
PEP 10-8 IES e333! 
Table 8 


SE(e) and the RMS for the 66 Study Areas; PEP 2-9 and PEP 10-8. 
The Models Include Percent Minority, Percent Conventionally Enumerated, 
and Either the Crime Rate or the Percent Urban. 


Crime rate Percent urban 
rms rms 
SE (€) area SE SE (€) area SE 
PEP 2-9 “15 .65 .76 .65 
PEP 10-8 .00 28 .00 5 


Notes: Let K bea 66 xX 66 diagonal matrix, whose (i,/) element is K;. Let X be the 66 x 4 matrix of 
explanatory variables. Let H = X (XTX) — xT andr-! = K~! + SE(e) —2 (I — H). The 
66 area undercounts are estimated by the Ericksen-Kadane model as TK-! y, where y is 
the 66 x 1 vector of PEP estimates. The rms SE for the 66 study areas is Vtrace '/66. For 
details, see FN. At trial, Ericksen and Kadane estimated SE(e«) from 51 study areas (whole states 
and DC); we followed suit in FN. Here, we use the 66 study areas, since that seems to be EKT’s 
current recommendation. The difference is noticeable. 


On “‘statistical criteria,’’ contrary to the claims made by EKT, 10-8 is preferred to 2-9 and 
percent urban is just as good an explanatory variable as the crime rate. Their qualitative critique 
seems off the mark too. Of course, different urban areas are different, just as EKT say. So 
are different central cities. Similarly, minority persons living in central cities are likely to be 
different from those in suburbs. And so forth. All of EKT’s variables are ‘‘blurred predictors”’ 
of undercount, and some are blurrier than the percent urban (p. 934). 


With respect to this set of issues, the judge in the 1980 case was harder on Ericksen and 
Kadane than we are: 


‘‘Moreover, as defendants’ experts persuasively explained, no one series of PEP estimates 
can be reliably shown to be superior to the others, or indeed, to the census itself, because 
there is insufficient knowledge with respect to which PEP procedures are better suited 
for measuring census undercount. While two of plaintiffs’ experts expressed a preference 
for the ‘series 2-9’ PEP estimates based upon the hypothesis that the PEP procedures 
employed in arriving at those estimates were superior to the procedures used for the other 
PEP estimates, the plaintiffs’ experts offered nothing more than unsupported assumptions 
in support of that position. On the other hand, the defendants’ experts offered equally 
plausible assumptions which favored different PEP procedures, producing dramatically 
different PEP estimates.’’ (674 F Supp 1102, footnotes and citations omitted.) 
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7.3 Simulation Studies 


We had a simulation study making three points: (a) you could not infer from the data which 
variables go into the model, (b) standard errors depend on assumptions about disturbance 
terms, and (c) the standard errors computed by Ericksen and Kadane were quite optimistic. 
We had two additional points on this topic: (d) standard errors do not measure the impact of 
bias; (e) the Ericksen-Kadane smoothing simply passes through any bias in PEP that is well 
related to the explanatory variables. 

Points (a) through (e) are real obstacles to showing that the model improves on the PEP 
estimates. EKT do not comment on points (b), (d) and (e). They deny (a), but more or less 
concede point (c). For our part, we concede that in our simulation - which grants half the 
model - regression does reduce sampling error. We still think (a) is right, as will be argued 
below. And in other contexts, smoothing may actually increase sampling error (Ylvisaker 

1991 p. 7). 
~ EKT (p. 943) criticize our study, because it covered only models with three variables in the 
equation and did not restrict the f-statistics. So we repeat the simulation here. In essence, we 
take PEP 10-8 as ‘‘truth,’’ and add for each of the 66 study areas / a random error with variance 
K;, as in (4). This grants equation (1) and the assumptions on 6;. We choose variables 
according to the procedure outlined by EKT (p. 935), and fit the regression model, repeating 
the whole process 100 times. 

Table 9 shows the variables selected in the first 10 runs. As will be seen, there is no consistency - 
except that the percentage ‘‘conventionally enumerated’’ always comes in. Over the 100 runs 
- excluding the ones that produced no acceptable model - the nominal rms error was about 
30% too small, and improvement of the composite estimator over PEP was exaggerated by 
a factor of 1.75. Assumptions matter. 


Table 9 
A Simulation Experiment on Variable Selection; PEP 10-8 is Taken as ‘‘Truth.”’ 


i  ____________ EE 


Run CC Min Crime Conv Ed Pov Lang MU 
1 x x x 
2 x x 
3 x x 
4 X % x 
5 x x 
6 x X 
7 x x x 
8 X x 
9 x X 

10 There was no model satisfying EKT’s criteria 


i ee UUUEUE aI EIEEISNESEE EES 


Notes: CC is an indicator for central cities; Min, the percentage of minorities; Crime, the crime rate; Conv, the 
percentage who were conventionally enumerated; Ed, the percentage with no high school degree; Pov, the 
percentage below the poverty line; Lang, the percentage who have difficulty with English; MU, the percentage 
living in multiple-unit housing. 
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Table 10 


A Simulation Experiment on Variable Selection. 
PEP 2-9 is Taken as ‘‘Truth’’; Percent Uran (Urb) is Permitted as an Explanatory Variable. 
The Table Shows The Number of time Each Variable is Entered, and The Average 
of its Coeeficient (Over The Time it Enters); 100 Data Sets were Generated. 


Variable ue Seen * ek 
CC 17 2.954 
Min 82 0.071 
Crime 53 0.053 
ent 93 0.028 
Ed 5 0.085 
Pov 1 0.135 
Lang 17 Ost 
MU 0 2K 6 ok KOK 


Urb 23 0.060 


A minor digression on census procedures. ‘‘Conventional enumeration’’ means that 
respondents were asked to fill out the forms and hold them for collection by an enumerator; 
this process was used in largely rural areas, particularly in the west. Conv is the percentage 
of persons living in areas that were conventionally enumerated. (In urban areas, forms were 
to be mailed back.) The undercount in 1980 was relatively high in rural areas, probably due 
to incomplete maps and address lists; that may be why conv is such a powerful explanatory 
variable. 

We did an additional simulation with PEP 2-9 taken as truth, allowing percent urban to 
be selected as an explanatory variable. The results are shown in Table 10. Again, the percent 
conventionally enumerated comes in as does the percent minority. Otherwise, there is a fair 
degree of inconsistency. And the much-maligned percent urban is chosen more often than 5 
of EKT’s variables, including the central-city indicator. The data do not determine the model. 


7.4 The Regression Model at Trial 


As Statisticians, we are intrigued by arguments about regression. However, the court was 
not impressed: 


‘*In their rebuttal case, the plaintiffs argued that the application of regression analysis 
to the undercount estimates derived from the PEP would enable the Bureau to use the 
PEP to accurately adjust the 1980 census. However, both plaintiffs’ and defendants’ 
experts agreed that regression analysis will not in any way alleviate the bias in the PEP 
and plaintiffs apparently do not contend otherwise. In short, while regression analysis 
may remove some of the random sampling error in the PEP, regression analysis will not 
reduce the substantial errors in the PEP caused by erroneous matches, the untested 
assumptions made with respect to the unresolved cases, and correlation bias. Moreover, 
the overwhelming weight of the evidence supports the conclusions of defendants’ experts 
that the principal difficulties with the PEP stem from these biases rather than from 
sampling error.’’ (674 F Supp 1103, footnotes and citations omitted.) 
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8. SUMMARY AND CONCLUSION 


Ericksen, Kadane, and Tukey argue that they can improve on the 1980 census counts by 
statistical adjustment. They seem now to agree that adjustments would not have been justified 
for subareas of the 66 PEP study areas. With respect to the 66 areas themselves, disagreement 
remains. In our opinion, success of any of EKT’s proposed adjustments rides on unverified 
and implausible assumptions-about missing data, undercount mechanisms, bias in PEP, and 
stochastic errors in regression models. Changing the assumptions changes the results, and taking 
averages Over various sets of assumptions does not, at least in our opinion, make the problem 
go away. EKT conclude (p. 943). 


‘“We believe that the Census Bureau creates political difficulties for itself when it ignores 
the undercount. The bureau will put itself in a better position by making its best effort, 
using available statistical and demographic methods, to adjust for the undercount. Errors 
will remain, but they will be smaller and we will no longer know in advance who is losing 
money and power because of the undercount.”’ 


This political analysis has merit, but there are caveats. We think it quite unfair to say that 
the Bureau has ignored the undercount. Nor are the Bureau’s political difficulties entirely of 
its own creation. Adjustments can indeed be devised to satisfy particular groups or settle indi- 
vidual law suits. However, the census is used to share out fixed resources, so there will always 
be losers as well as winners. These will have little trouble identifying themselves, after the fact 
if not before. And up to now, the goal of improving on the accuracy of the census by statistical 
adjustment has proved illusory. 


9. HOW DID THE COURT RULE? 


At the time of writing, litigation about the 1990 census goes on. With respect to the 1980 
census, however, the court ruled for the defendants on all the issues. We quote from the digest 
and opinion Cuome et al. v. Baldrige et al. 674 F. Supp. 1089-1108 (SDNY 1987). 


“*State, city, and their officials brought action against Secretary of Commerce, Director 
of the Bureau of the Census, and other officials seeking statistical adjustment of 1980 
decennial census. The District Court, Sprizzo, J., held that state and city failed to establish 
that statistical adjustment of decennial census was technically feasible.’’ 

‘* .. it is essential to any such adjustment that a technically feasible adjustment 
methodology exist which gives a truer picture of the United States population on a state- 
by-state basis for apportionment purposes, and a sub-state-by-sub-state basis for federal 
funding purposes ... If it does not, then no adjustment can or should be made .. 
because ... both congressional seats and revenue sharing funds are fixed quantities, and 
an increase in the population in one state or sub-state area will adversely affect the shares 
of other localities .. 

‘‘Notwithstanding the complexity of the facts ... this action presents one issue to be 
resolved by the Court: whether the plaintiffs have sustained their burden of proving that 
a statistical adjustment of the 1980 census will result in a more accurate picture of the 
proportional distribution of the population of the United States on state-by-state and 
sub-state-by-sub-state basis than the unadjusted census. The Court finds as a matter of 
fact that the plaintiffs have not sustained that burden, and the action must therefore be 
dismissed ...”’ 
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APPENDIX 
Synthetic Estimation and Loss Functions 


Synthetic Estimation 


Section 5 in Wolter and Causey (1991) describes their empirical proof that synthetic adjust- 
ment would have brought the 1980 census closer to truth. The evidence is a simulation study: 
the ‘‘census”’ and ‘‘truth’’ are both defined in terms of an artificial reference population devel- 
oped by Isaki et a/. (1987). However, the argument depends rather strongly on the reference 
population, as shown by Passel (1987). The object here is to sketch a variation on one of Passel’s 
examples. Indeed, if the reference population is defined by using PEP 2-9 to correct the 1980 
census, then synthetic adjustment moves the counts farther from truth. 

Table 11 shows the data for the four census regions - Northeast, Midwest, South, and West. 
With squared differences in population shares weighted by size, 


r.m.s. difference between Synthetic B and PEP 2-9 = 0.21 of 1%. (13) 


r.m.s. difference between the Census and PEP 2-9 = 0.15 of 1%. (14) 


PEP 2-9 is rather close to the ‘‘average preferred PEP”’ in Table 2. In that table, the census 
was closer to synthetic B than to the PEP estimates. In Table 11, the census is closer to PEP, 
and synthetic B is the outlier. The difference between the two tables seems to be the disaggrega- 
tion. Table 2 disaggregates the U.S. by race and ethnicity; Table 11, according to conventional 
census geography. 

Of course, using another disaggregation or a different synthetic adjustment could reverse 
the comparisons yet again; so could a change in the loss function. To illustrate the possibilities, 
consider adjusting the 66 PEP study areas, rather than four regions. Keep PEP 2-9 as ‘truth.’ 
Using the loss function (17), the census is preferred to synthetic B, by a little. Using (16), 
synthetic B shows a much smaller loss than the census. 


Table 11 


Population Shares from The Census, The Synthetic B Estimates, 
and PEP 2-9, in Percent; Census Counts, in 1,000s 


Northeast Midwest South West Total 
Synthetic B - PEP 2-9 .08 % .03 % 24% — 35% .00% 
Census —- PEP 2-9 .10% .06% 12% — .28% 00% 
PEP 2-9 21.59% 25.92% 33.15% 19.34% 100.00 % 
Synthetic B 21.67% 25.95% 33.39% 18.99% 100.00% 


Census 21.69% 25.98% 33.27% 19.06% 100.00% 
Census count 49,135 58,866 Tse WO 43,172 226,545 
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Loss Functions 


Proponents of adjusting the 1990 census make analytic arguments based on loss functions: 
see Wolter and Causey (1991) or Ericksen, Estrada, Tukey and Wolter (1991, p. 20 of the main 
report; Appendices G and H). The essence of argument can be summarized in the lemma which 
follows. To set up the notation: the country is divided into 7 areas, indexed by i; c; is the census 
count in area / and ¢; is the true count. The ‘‘synthetic estimate’’ for area iis x; = \c;, where 
the ‘‘adjustment factor’’ \ is computed from other data. 


Demmin FOU — lo. 74, 7 leuc, rane to U. bel 0 =<" N co and X= "AC. (15) 


Then 


3, (x; — t,)7/c; (16) 


is minimized when 


The proof is omitted as trivial. The ‘‘loss function’’ defined by (16) differs in detail from 
the one used in (7), (8), (9), (13) and (14), which can be written as 


Cilia 
balsa. t 


with 


The loss function (17) emphasizes shares while (16) emphasizes counts; furthermore, (17) 
puts more weight on large sub-populations while (16) does the opposite, due to the division 
by c;. We are not particularly attached to (17), and see no good way to choose one loss 
function rather than another. 

Lemma (15) is mathematically correct, but it is so far removed from the realities of adjusting 
the 1990 census that it seems virtually irrelevant. In this connection, there are four points to 
consider: 


(a) The true population total T is unknown; Wolter and Causey attempt to deal with this 
problem, but the example in Table 11 refutes their argument: synthetic adjustment makes 
the 1980 census less accurate. 

(b) Synthetic estimates do not perform well under aggregation. 

(c) At the block level, rounding error may dominate. 

(d) Loss functions only capture part of the policy problem, and may obscure more than they 
reveal. 
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Points (b), (c) and (d) will be discussed in more detail; but first, a brief review of proposed 
methods for adjusting the 1990 census. The population is divided into 1,392 ‘‘post strata,”’ 
e.g. male hispanic renters age 30-44 in central cities in the Pacific Division. Index these post 
strata byj = 1, ..., 1,392. For each post stratum /, an adjustment factor A; is computed by 
capture-recapture techniques from data collected in a Post Enumeration Survey (Freedman 1991). 

The 1,392 factors are used to adjust all small-area counts as follows. Fix an area, e.g. atown. 
This area will intersect many of the post strata. The census count for each area X post stratum 
intersection is multiplied by the corresponding \,, and the products are summed. In other 
words, subpopulations are adjusted by the synthetic method, and synthetic estimates are 
aggregated to obtain totals for small areas. 

This completes a sketch of the adjustment process, and we return to points (b), (c) and (d). 

(b) Synthetic estimates do not perform well under aggregation. This was already pointed 
out by Fellegi (1985). See Cohen and Citro (1985, p. 318). For another example, see Tables 
3 to 5 above. 

(c) At the block level, rounding error may dominate. Census adjustment would in fact be 
done at the block level. (A ‘‘block’’ is the smallest unit of census geography; there are 6.5 million 
blocks in the country.) A typical block in an urban area may intersect 25 post strata; each 
block x post stratum intersection contains only a handful of people. Multiplying by an adjust- 
ment factor means adding or subtracting a fractional number of people, and the fractions would 
be rounded. The next example illustrates how rounding error may offset any advantage from 
synthetic adjustment. 

Suppose there are n ‘‘areas’’ to adjust; these could be viewed as blocks intersected with one 
fixed post stratum. Suppose each of these areas has the same census count, c. Fix m < n. 
Suppose that in each of m areas, the census has missed one person; in the remaining n — m 
areas, the census count is exactly right. In all, there is an undercount of m people. These facts 
are considered as known; but it is not known which blocks have the missing people. According 
to (16), 


loss from using the unadjusted census = m/c. (18) 


Adjustment would proceed as follows: choose m areas at random, and add one person to 
each of these areas. Clearly, the expected loss from adjusting is 


m m 
=2(1- 7) --, (19) 
n Cc 
Lemma. If m < n/2, there is an expected net loss from synthetic adjustment. 
Proof. If m < n/2, then 
m ids 9, 
desi arsearee (20) 
n c Cc 
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Of course, this example is almost as stylized as Lemma (15). In short, the value of census 
adjustment cannot be established by a priori argument. 

(d) Loss functions capture only part of the policy problem, and may obscure more than 
they reveal. To begin with an example, suppose that the census is in error, and the main impact 
of that error is to transfer a congressional seat from California to Pennsylvania. There is a 
gain for Pennsylvania, and a loss for California. There may be a net social loss from this 
misallocation, but attempting to quantify that loss by (16) - or any similar formula - seems 
quite simplistic. 

We now present another example to illustrate point (d). To focus the issue, suppose the census 
undercount is largely confined to blacks and hispanics in New York, Chicago, Houston and 
Los Angeles. The census, by assumption, under-estimates the share of the population living 
in these four cities, and adjustment will partly correct that error. 

Due to its reliance on the synthetic method, however, adjustment will change population 
' shares everywhere. Areas which are heavily black and hispanic will have their population shares 
artificially increased, at the expense of other areas. This will be so even in regions of the country 
where the census was accurate. 

In this example, the distribution of resources between the four cities and other areas may 
be made fairer by adjustment - at the expense of distortions introduced everywhere else. The 
loss-function approach slides over this difficulty. Balancing inequities is a political problem, 
not easily resolved by a statistical formula. 

Some observers may consider the example to be extreme. However, the Post Enumeration 
Survey only samples 5,000 blocks, and there are 39,000 jurisdictions to adjust. Real information 
about the undercount is necessarily confined to relatively few localities. Adjustments for other 
areas must therefore be based largely on theory rather than data. 


REFERENCES 


CITRO, C.F., and COHEN M.L. (Eds.) (1985). The Bicentennial Census: New Directions for 
Methodology in 1990. Washington, D.C. National Academy Press. 


ERICKSEN, E.P., and KADANE, J.B. (1985). Estimating the population in a census year (with discussion). 
Journal of the American Statistical Association, 80, 98-131. 


ERICKSEN, E.P., KADANE, J.B., and TUKEY, J.W. (1989). Adjusting the 1980 census of population 
and housing. Journal of the American Statistical Association, 84, 927-943. 


ERICKSEN, E.P., ESTRADA, L.F., TUKEY, J.W., and WOLTER, K.M. (1991). Report on the 1990 
Decennial Census and the Post Enumeration Survey, submitted to the Secretary of the Department 
of Commerce, June 22, 1991. 


FAY, R.E., PASSEL, J.S., ROBINSON, J.G., and COWAN, C.D. (1988). The Coverage of the 
Population in the 1980 Census. Washington, D.C.: U.S. Department of Commerce, Government 
Printing Office. 


FELLEGI, I. (1985). Comment. Journal of the American Statistical Association, 80, 116-119. 


FREEDMAN, D.A. (1991). Adjusting the 1990 census. Science, 252, 1233-1236. Copyright 1991 by the 
AAAS. Excerpted by permission. 


FREEDMAN, D.A., and NAVIDI, W.C. (1986). Regression models for adjusting the 1980 census 
(with discussion). Statistical Science, 1, 1-39. 


HOGAN, H., and WOLTER, K. (1988). Measuring accuracy in a post-enumeration survey. Survey 
Methodology, 14, 99-116. 


24 Freedman and Navidi: Should We Have Adjusted the Census of 1980? 


ISAKI, C., DIFFENDAHL, G., and SCHULTZ, L. (1987). Report on statistical synthetic estimation 
for small areas. Technical report, Bureau of the Census. 


PASSEL, J. (1987). A note about synthetic estimates of undercount. Memorandum, U.S. Bureau of the 
Census. 


SCHIRM, A.L., and PRESTON, J. (1987). Census undercount adjustment and the quality of geographic 
population distributions (with discussion). Journal of the American Statistical Association, 82, 965-990. 


SCHIRM, A.L. (1991). The effects of census undercount adjustment on congressional apportionment. 
Journal of the American Statistical Association, 86, 526-541. 


WOLTER, K. (1986). Comment. Statistical Science, 1, 24-28. 
WOLTER, K. (1991). Accounting for America’s uncounted and miscounted. Science, 253, 12-15. 


WOLTER, K., and CAUSEY, B. (1991). Evaluation of procedures for improving population estimates 
for small areas. Journal of the American Statistical Association, 86, 278-284. 


YLVISAKER, D. (1991). A look back at TARO. Technical report, Department of Mathematics, UCLA. 


Survey Methodology, June 1992 Zo 


COMMENT 


STEPHEN E. FIENBERG! 


Freedman and Navidi give their current thought-provoking retrospective on the issue of 
undercount in the 1980 U.S. decennial census. Unfortunately they fail to address the question 
posed in the title of their paper and instead attempt to vindicate their views expressed earlier 
in Freedman and Navidi (1986) and to rebut commentaries on these views by others. Their theme 
is a familiar one to those who have read earlier versions of the debate connected with the ‘‘1980 
lawsuit’’ over adjustment: The census is very complex and only a small undercount is thought 
to remain; adjustment utilizes statistical modelling that relies on unverifiable assumptions; a 
bad adjustment may be worse than nothing. 

_ I disagree with many of the views expressed by the authors and believe that they distort both 
what should have been at issue with respect to 1980 and what appears to be at issue in litigation 
currently pending over correction of the 1990 census. In the following, I attempt to explain 
my differences with the authors and give my perspective on two questions: the one raised in 
the title and the one implicit in the material introduced regarding the 1990 census. (Note: The 
aulthor played no part in the litigation over the adjustment of the 1980 census but he is working 
with the City of New York and other plaintiffs in litigation stemming from the decision by 
the Department of Commerce not to adjust the results of the 1990 census.) 


1. The Title and the Paper Address Two Different Issues 


Should we have adjusted the census of 1980? The only sensible way to answer this question 
in my mind is to ask it in the context of the evidence available at the time, or at least available 
when the issue was being adjudicated by the courts. As such, the description of the issues 
identified by the Bureau of Census and presented in the opening section of the paper are 
important, although they had little to do with the original decision not to adjust in 1980 made 
by the Director of the Bureau in advance of the availability of coverage information. 

The remainder of the paper, however, does not deal with this question. Rather, it addresses 
the continued attempt by advocates for the two sides to marshal evidence to support their posi- 
tions from the litigation. In essence, the authors are asking a question about the current evidence 
in support of a decade old decision. As with all statistical issues, continued data analysis and 
retrospection can update our judgment on the answer to such a question and thus the authors’ 
effort to revisit the evidence connected with the 1980 census yet again is to be applauded. 

We can thus turn to the framing of the question to be answered. For me, the judge’s state- 
ment of the issues at trial falls short of the mark, as does Freedman and Navidi’s description 
of the undercount issue. They imply that the only real issue is the accuracy of the adjustment 
process and that there is only a potentially small undercount about which we should be worried. 
Neither could be further from the truth. At issue is both the accuracy of the census and the 
adjustment process. And, it is the substantial differential undercount, i.e. the difference between 
the undercount for Blacks and the undercount for non-Blacks and between Hispanic and 
non-Hispanic, that is important when we come to assess census accuracy. This is because census 
figures are typically used to divide resources among groups in the population, resources such 
as seats in the U.S. House of Representatives; seats in state legislatures; federal funds; 
and so on. 
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Using the method of demographic analysis the Census Bureau has documented that, from 
1940 through 1980, the difference in the rate of undercount for Blacks and non-Blacks has 
remained roughly constant, somewhere between 5% and 6% even though the overall 
undercount declined from 5.6% to 1.4% (see Fay ef al., 1988). The 1.4% figure does not mean 
that the census correctly counted over 98% of the U.S. population in 1980. Rather 1.4% 
represents the net undercount, which can be thought of as the difference between the actual 
undercount (consisting of missed individuals or omissions) and the overcount (erroneous 
enumerations and duplications). Even if the errors of overcount and undercount balanced 
perfectly at the national level, thus producing a 0% national undercount, we might still have 
a differential undercount problem. For the 1980 census, the Bureau determined that there were 
6 million erroneous enumerations in the census, of which as many as | million were fabrications, 
and as many as 2.5 million people were erroneously included twice at the same location. Given 
the Bureau’s report of a net undercount of 1.4% or 3.2 million people in 1980, we have an 
estimate of 9.2 million omissions (people who were missed) from the 1980 census count. By 
adding omissions to erroneous enumerations we get a total of 15.2 million errors in counting 
individuals, which corresponds to almost 7% of the official 1980 census total. To me, this level 
of error in the census represents a major problem that must be addressed when we talk about 
the appropriateness of adjustment in 1980. Of course, shortly after the 1980 census was 
completed the Census Bureau painted a much rosier picture of the accuracy of the raw census 
counts. Perhaps, in keeping with the literal meaning of the title of this paper, Freedman and 
Navidi wish us to accept as accurate what we now know to have been a seriously incomplete 
assessment on the part of the Census Bureau. I hope this is not the case. We now know much 
more about the level of the error in the raw census counts from 1980. The residual issue is 
whether we have any better information about the various forms of adjusted counts given the 
passage of a decade. 


2. Facts and Theorems 


The present paper is full of statements about the accuracy of the census adjustment 
procedures. When it comes to stating and proving theorems, I have no doubt that Freedman 
and Navidi will get them correct. The relevance of such theorems for census adjustment is a 
different issue. 

Freedman and Navidi present a simple and seemingly compelling counterexample to the 
Schirm-Preston theorem on synthetic adjustment. It is certainly true that the overall totals for 
state A and B in their example are correct in the census and incorrect in the synthetic adjustment, 
although barely so. But it is also true that the large shift of the counts of Whites and Blacks 
in state B is what I understand that an adjustment is designed to accomplish and it does so 
at the expense of a minor perturbation in State A. Moreover, if the fictional state B is like those 
in the real U.S., the distributive accuracy of the synthetic data for geographic areas within 
State B is much improved while that within State A seems not to be seriously affected. Freedman 
and Navidi also offer their conclusion in the form of a parable to which I respond with one 
of my own. Small overall undercounts can hide a multiplicity of censal errors, ones that tend 
to ‘‘balance’”’ in the aggregate but exact a heavy toll from states with large hard-to-count 
minority populations. 

I also found the evidence from the Schirm-Preston simulations far more credible than did 
Freedman and Navidi and wonder whether this may be related to the corrected version of the 
Schirm-Preston theorem that is referred to as holding under more complicated sets of conditions 
involving weighted averages. What I am asking is whether the corrected theorem is more relevant 
to the real problems of undercount in the U.S. than Freedman and Navidi’s counterexample. 
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3. Issues in Dispute with EKT 


Freedman and Navidi spend much time rehashing the issue of the multiplicity of PEP series 
and by stressing the variations amongst them. While there is some merit in the position that 
there is not a clear and overwhelming choice from amongst the adjustment alternatives, it may 
still be the case that several choices would be superior to an unadjusted census. The authors 
focus on the variation amongst the full set of 12 alternatives, some of which to me are 
implausible given the assumptions that they rely upon. Even though I do find the arguments 
in support of the use of synthetic adjustments reasonable, I do agree with the authors that there 
is a clear difference between the synthetic and PEP adjustments. 

Where are we left in this debate? I find the conclusion of Ericksen, Kadane, and Tukey 
compelling even though I agree with Freedman and Navidi that issues remain about the specific 
choice of techniques favored by EKT. Freedman and Navidi argue that their principal claim 
is irrelevant to the issue of accuracy. I disagree. Perhaps the authors believe that the millions 
of uncounted people that virtually all agree were missed in 1980 are still out hiding in the foothills 
of South Dakota, or in some other state with few minorities. 

A familiar theme in various writings by one of the present authors is the problems that arise 
when assumptions are not satisfied. Here again the authors pursue this theme with respect to 
the linear equation used for smoothing. They appear to argue that either all assumptions must 
be perfectly justified or ‘‘all bets are off’’. Nothing could be further from the truth. Surely 
they don’t expect anyone to believe the argument that the eruption of Mt. St. Helens interfered 
with census taking in a serious way and thereby undercuts the usefulness of the smoothing 
approach. Similarly, their notion that precise specification of predictor variables is crucial to 
the accuracy of smoothing is also something with which I take issue. Finally, I read the report 
by Ylvisaker (1991) who reexamined data from the trial census in Los Angeles in preparation 
for 1990, but I could not find the evidence Freedman and Navidi state is supportive of their 
claim that smoothing increases variability. 

I do believe with Freedman and Navidi that the census process is enormously complex and 
that the approach to adjustment that was proposed in connection with the litigation over the 
1980 census is far from flawless. Yet I still find their arguments exaggerated and they tend to 
obscure the old maxim that ‘‘the best is the enemy of the good.’’ Of course the assumptions 
are not satisfied. Of course one could produce a better way to adjust that does not suffer from 
all of the flaws in the methods advocated by EKT. But this does not mean that adjustment 
with these flawed methods would not have been an improvement over the badly flawed 
unadjusted counts. 


4. Adjustment in 1990 


At various points throughout the paper the authors allude to comparable issues and 
imponderables in connection with adjustment in the 1990 census. I think that the reader should 
make a clear distinction between the methods used in connection with analyses presented as 
part of the 1980 lawsuit and those used as an integral part of the 1990 census. Many of the 
problems encountered by those who attempted to prepare adjusted figures in 1980 have clearly 
been overcome and the debate over adjustment in 1990 has become much sharper in its focus. 
Moreover, unlike in 1980, the key statistical methodologists at the Census Bureau, and the 
Director herself, found the adjustment methods used in 1990 justifiable and they recommended 
proceeding with an adjusted census. The statisticians were overruled by the Secretary of 
Commerce. The matter is now in the hands of the court once again. 
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Freedman and Navidi do not state their position regarding the use of adjustment techniques 
for the 1990 census, but Freedman (1991) makes quite clear that his judgment from 1980 has 
not changed. I disagree with this view. There may well be reason to argue, as the authors do, 
that the Census Bureau should not have adjusted the census in 1980. But 1990 is another 
matter. In June, the General Accounting Office, an investigative arm of the U.S. Congress, 
reported that there were 25.4 million gross errors in 1990 census, or about 10.4% of the resident 
population. The Bureau estimates that the net undercount was about 5 million people and that 
the differential undercount was the largest since the Bureau began to estimate it beginning 
with the 1940 census. Methodology for carrying out an adjustment in 1990 is much improved 
relative to that at issue in 1980. In my view, the results of the Census Bureau’s evaluation 
studies clearly supported the use of adjustment for the 1990 census results. Perhaps the judge 
this time will see the issue of adjustment differently than the the way that Freedman and Navidi 
tend to frame it. 
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COMMENT 


IVAN P. FELLEGI! 


Freedman and Navidi provide a very thorough and lucid description of the considerations 
and arguments surrounding the adjustment debate for the 1980 US Census. These arguments 
focus on the quality of population counts and population distributions for Census day 1980. 
Furthermore, it is taken as given that whatever decision is made on adjustment, based on this 
consideration of population counts, will be applied to the complete census database and 
therefore to all the outputs flowing from it. Rather than commenting in detail on the arguments 
of the protagonists in this debate (though I am of the view that the correct decision was made 
for the 1980 Census), I would like to offer, from a Canadian viewpoint, some thoughts that 

. Suggest a broader frame of reference for the adjustment debate. 


1. The Census is Much More Than a Head Count 


Ever since the age of modern census taking began, the objective has always been more than 
the provision of an accurate count of the population. Yet the increasingly impressive literature 
dealing with the issue of adjusting the census tries to assess the relative advantages of alternative 
courses of action solely from the point of view of estimating the total number (and proportion) 
of persons living in a set of areas. I understand, of course, why this is so: (a) the problems 
involved are difficult enough as it is, and (b) so much money and political power is associated 
with the population counts (or estimates). 

I will come back to point (b) above. As far as (a) is concerned, I think it would not bea 
scientifically defensible position to adjust the census by whatever method without taking into 
account the impact of such an action on the multitude of uses of census data. Indeed, I believe 
that if the objective of the census was restricted to estimating population totals and distributions, 
we would most likely (at least in Canada) try to find quite different methodologies to fulfil 
such a very different role. Given the multiplicity of objectives served by the Census, the fact 
that this multivariate and rich data base is difficult to model is not an adequate excuse for dealing 
with the much simpler issue of population counts and then uncritically applying the conclusions 
to the entire data base. 


2. Point-in-time Precision of Population Counts May not Be the Relevant Measure 
for the Intercensal Distribution of Federal Funds and Power 


There seems to be a preoccupation with exquisite precision of population counts and distribu- 
tions in the census year. Of course a periodic stock-taking, providing good and comparable 
data for small areas and/or small population groups is a main justification for the expense 
involved in taking a census. But the excessive (it seems to me) preoccupation with the precision 
of the census count is motivated by equity considerations: a great deal of money and political 
power is distributed based, in part, on population numbers. Let us examine these two equity 
issues in turn. 

First, dealing with the distribution of funds, indeed substantial sums are distributed in 
Canada from the federal government to provinces based on formulae that are very sensitive 
to population numbers and distributions. However, two points are of great significance from 
the point of view of census adjustments. 
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(1) The formulae use a large array of statistical information (most of it derived from sources 
other than the census), only one component of which is population. It is well known that 
several of the other components are subject to significant sampling and non-sampling 
errors. It is an open question whether any reasonable loss function designed to assess the 
combined impact of all the errors involved would be materially improved even if the census 
errors could be entirely eliminated. 


(2) Even more important, if the adjusted population numbers result in a smaller loss function, 
or if more generally they are assessed to be closer to the truth for a significant majority 
of the areas involved, then these can serve as the basis for improved population estimates 
(and not just in census years) without adjusting the entire multivariate census data base. 
In Canada (and in the United States) there is a long history of publishing estimates of the 
census undercount. Serious consideration is, indeed, being given in Canada to taking the 
next step: publish the census results as taken, and have a set of official population estimates 
which takes account of the known census undercount. After all, in non-census years the 
official population estimates incorporate a wide range of estimation techniques - some 
of them having errors at least as large as the likely errors of undercount estimates (even 
model-based ones). It may be scientifically quite appropriate to publish the best available 
population estimates in both census and non-census years - whether or not these estimates 
coincide with the census counts in census years. It may well be that legislation, or regulations 
under existing legislation, have to be amended to permit the use, particularly in a census 
year, of population estimates different from those directly derived from the census. But 
(a) that has little to do with the scientific arguments involved in the adjustment debate, 
and (b) it is more honest than relabelling the ‘‘adjusted’’ census counts to be ‘‘the’’ census 
counts simply because the law might require the latter. 


The arguments are different in respect of the distribution of political power based on census 
counts, although the fixation on point-in-time precision seems to me to be equally misplaced. 
Indeed, the census population figures are also used to distribute seats in the House of Commons 
in Canada (and in the House of Representatives in the USA). However, the distribution of 
seats based on the census is used for ten years. During those years typically massive population 
shifts occur. Leaving aside the interpretation of laws, it seems to me that the substantive question 
is whether a suitably defined loss function, designed to capture the average deviation from the 
objective of ‘‘one person one vote’’ over a ten year period, would be materially reduced if the 
census counts were adjusted for the estimated undercount. I have not made such a calculation. 
However, it seems to me that the range of population shifts over ten years are substantially 
larger than the range of estimated undercounts. I would therefore, speculate that even 
apparently significant potential census year adjustments (and corresponding shifts in the alloca- 
tion of seats in the legislature) are relatively less significant than the deviations from the ‘‘one 
person one vote’’ rule occasioned by migration over a ten year period. Since this particular 
use of the census is mandated by the constitution, changing the law is not an option. But a 
scientifically informed debate regarding the appropriate interpretation of the constitution is 
very much in order - taking full account of the two main causes of deviation from equity in 
political representation during the ten year intercensal period: census errors and population 
shifts (mostly migration). 
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3. 
(a) 


(b) 


(c) 


(d) 


Conclusion 


The census is a multivariate integrated data base. The case for “‘adjusting’’ it is far from 
obvious, even if (a big if) the simplest variable involved — the count — can be improved 
by doing so. 


If a set of population estimates that are judged to be better than the census counts (according 
to suitably defined criteria) can indeed be generated, these should be produced and used, 
without necessarily adjusting the entire census data base. The criteria should relate to the 
set of areas (and other breakdowns) for which estimates are required. 


If the law requires ‘‘census’’ derived population counts when in fact substantively the best 
available population estimates are called for, it would appear to be preferable to try to 
change the law rather than to adjust (in effect weight) the entire census data base to agree 
with estimated population numbers - solely in order to be able to refer to the population 
estimates as ‘‘the census’’. 


Equity considerations, both in terms of the distribution of federal funds and political 
representation, apply to the entire intercensal period, not simply for the year of the census. 
They should be studied using models that take full account of this fact. 
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COMMENT 


N. CRESSIE! 


A critical assessment of our past successes and failures makes us better equipped to provide 
future successes. Missing data and matching problems in the 1980 Post Enumeration Program 
were major impediments to a successful adjustment of the 1980 U.S. Decennial Census. A court 
case, Cuomo et al. versus Baldridge, was brought by New York State and others to require 
the Census Bureau to adjust the 1980 Census numbers for undercount. Testimony from Barbara 
Bailar, then Associate Director of Statistical Standards and Methodology at the Census Bureau, 
and Kirk Wolter, then Chief of the Statistical Research Division at the Census Bureau, made 
it clear that 1980 data and methods were inadequate for an accurate adjustment of the whole 
country. 

In 1987, Judge Sprizzo ruled against New York. However, that decision did not make the 
differential undercount go away; even the judge in his ruling acknowledged its presence. There 
is little disagreement that, differentially by race, national U.S. Census numbers have been 
persistently too small. Using demographic methods, the following estimates are available. 


1950: Black (and other non whites) demographically estimated undercount was 9.7%. White 
demographically estimated undercount was 2.5%. (Siegel 1974, Table 3). 


1960: Black (only) demographically estimated undercount was 8.0%. White (and other races) 
demographically estimated undercount was 2.1%. (Siegel 1974, Table 2, set D estimates). 


1970: Black (only) demographically estimated undercount was 7.6%. White (and other races) 
demographically estimated undercount was 1.5%. (Passel, Siegel and Robinson 1982, 
Table 1). 


1980: Black (only) demographically estimated undercount was 5.3%. White (and other races) 
demographically estimated undercount was — 0.2%. (Passel and Robinson 1984, Table 2). 


Further, there is little disagreement that racial composition is different within administrative 
regions (both large and small) across the U.S.A. The consequence of these two virtually 
undeniable facts is that undercount will be differential across administrative regions, leading 
to an unrepresentative geographic/racial profile of the nation and an unfair apportioning of 
political and financial resources. So, Freedman and Navidi state in their introduction ‘‘ ... 
If the undercount can be estimated with good accuracy, especially at the local level, adjustments 
can - and should - be made to improve the census.’’ 

Almost everyone agrees there is a problem. The adage, “‘If it ain’t broke don’t fix it,’’ does 
not apply here. It is an uncomfortable defence for a statistics professional to argue that 
uncontrolled-for biases and errors will not allow an adjustment for an undercount that is known 
to be there and known to be damaging. During the early 1980s, Bailar and Wolter established 
the Undercount Research Staff within the Statistical Research Division of the Census Bureau. 
Staff members have produced high-quality research that demonstrated ‘‘that it is technically 
feasible to correct the 1990 Census for differential undercoverage’’: (Childers et a/. 1987). 

It is time for Freedman and Navidi to relinquish their role as devil’s advocates; it is time 
for them to put their knowledge and talents into a constructive mode; and it is time for them 
to say what they mean by ‘‘good accuracy,”’ ‘‘local level,’? and various other qualitative 
affirmations. The adversarial atmosphere of the courts has spilled over into the various articles, 
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comments, and rejoinders we have seen on census undercount in the last 10 years. To solve 
a problem as hard as adjustment for undercount, the common goal needs to be recognized. 
From there, debate should center around differences on how that goal might be reached. If 
Freedman and Navidi’s position is that the goal is impossible to reach (which is what they seem 
to have implied over the years), then it should be stated. 

For the rest of this comment, I shall address a number of important technical matters that 
were raised by Freedman and Navidi (1986) and now, surprisingly, again in the article under 
discussion. In 1990, I presented a paper at the Census Bureau’s Annual Research Conference 
(Cressie 1990) that David Freedman was invited to discuss. At the last minute, he was unable 
to attend the conference but I continued to send him the discussion version and the final version 
and invited his comments. The paper is rather technical but addresses, successfully I believe, 
several major criticisms made by Freedman and Navidi (1986) of the statistical modeling 
approach to undercount adjustment. 

First, the paper expresses a preference for the ‘‘stratification approach”’ over the ‘‘regression 
approach’’. Stratification is a special case of regression where the explanatory variables are 
restricted to 1 and 0, indicating presence or absence in a particular (demographic) stratum. 
There is little disagreement that undercount is differential across sex X age X race/ethnicity 
strata. Because the Census Bureau was committed to a regression approach, the bulk of the 
paper addressed the more general problem. 

Second, if one allows the regression error (see Freedman and Navidi’s e.g. (2)) to be depen- 
dent, such models can absorb bias and misspecification into the error term. The important 
concept to maintain is that true undercount in regions is unknown and the ignorance is 
quantified into a probability model. The goal is not estimation of the coefficients GB but 
prediction of the undercount. With an error term that does not have to be independent and 
identically distributed, this prediction is insensitive to misspecification (see also Cressie 1991, 
Chapter 3). 

Third, the inconsistency of the model to changes in geographic level is addressed by modeling 
adjustment factors, not undercounts, and by assuming the variance of the regression error of 
a particular area is inversely proportional to that area’s population. This assumption is justified, 
from both a Bayesian and frequentist point of view, in Cressie (1989). 

Fourth, the effect of estimation of variance-covariance parameters can be taken into account 
by modifying the results of Prasad and Rao (1990) to a multivariate context. One could also 
use a parametric boostrap, by generating data from the estimated model, re-estimating all 
parameters, and repredicting the undercount. 

Finally, it is acknowledged that all preceding model-based methods will likely do poorly 
if the model does not fit. Diagnostic methods are crucial to the success of statistical model- 
based adjustments for undercount. 

There is room for critical assessment of our past successes and failures. It is time to move 
on and solve this monumentally important problem with cutting-edge technology. A well 
designed, well implemented, and quality-assured 1990 Post Enumeration Survey with excellent 
computer matching and precise geography make the 1980 case look very different indeed. It 
is my opinion that adjustment can now be successfully carried out at the state level. Research 
and debate on whether that success can be carried down to lower levels of geography deserves 
our collective resources (e.g. Tukey 1983; Cressie 1988; Wolter and Causey 1991). Expected 
losses (or risks) can be used to measure the efficacy of adjustment procedures. Cressie (1988) 
gives sufficient conditions under which synthetic adjustment improves over census count; those 
conditions were satisfied in the 1980 Census and PEP 3-8 series. 
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COMMENT 


ALLEN L. SCHIRM and SAMUEL H. PRESTON! 


1. Introduction 


We thank the editor for inviting us to comment on this provocative article by Freedman 
and Navidi (hereafter, ‘‘F and N’’) and continue this important policy debate. Our comment 
mainly responds to F and N’s criticisms of our earlier research (Schirm and Preston 1987; 
hereafter, S and P). Although we disagree with much of F and N’s critique of Ericksen, Kadane 
and Tukey (1989), we leave to the authors of that article the task of defending their work. 

We disagree with many of F and N’s specific criticisms of S and P. Before discussing our 
detailed responses, we want to take a broader perspective and view our article and F and N’s 
criticisms of it in their entirety. 

F and N wrongly characterize our article in stating that we ‘‘present two major arguments, 
one analytical and one based on simulation.’’ In fact, we presented three analytical results. 
F and N criticize only one, and a minor one at that. Our most important analytical result suggests 
that synthetic adjustment would likely have improved the accuracy of the population distribution 
in 1980. As for our simulations, they were not intended to support any one argument. Instead, 
we simulated an extremely wide variety of circumstances to permit us to address several questions 
about synthetic adjustment and its effects. We found, however, that adjustment would have 
improved accuracy under all conditions simulated, including highly unfavorable circumstances. 


2. Analytical Results 


In S and P, we presented three analytical results. All three are mathematically correct. However, 
the second result - the sole target of F and N’s criticism - is, as we stated in our article, poten- 
tially ‘‘misleading because it ignores influences on overall adjustment success of systematic 
relationships between variations across states in census coverage for a group and differences 
between groups in how they are distributed across states.’’ Our third result, which is clearly 
the focus of our algebraic analysis and which does not depend on the second result, addresses 
this issue and takes into account the patterns of variations in undercounts across states. 
Although potentially misleading, we presented the second result to illustrate more forcefully 
a key implication of our third result, that systematic variations in state undercounts can matter. 

Our second analytical result suggests that the effect of adjustment for a given state hinges 
on how ‘‘close’’ the state’s undercounts are to the national undercounts. Contrary to F and 
N’s claim, our second analytical result is mathematically correct. F and N are able to dispute 
our finding only because they choose to define ‘‘close’’ without regard to our precise defini- 
tion. Thus, F and N’s ‘‘counterexample’’ to our second result does not pertain to that result 
at all, since their example violates the conditions that we derived and stated precisely in the 
appendix to our article. To repeat that result, we showed that the estimated proportion of the 
total national population residing in state 7 is made more accurate by adjustment if 
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where J is the number of racial (more generally, demographic) groups, a dot indicates 
summation over an index, and T and C superscripts designate true and census population 
counts, respectively. For state A in F and N’s ‘‘counterexample,’’ this expression implies 
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Therefore, the condition for improved accuracy from adjustment is violated for state A. 
Similarly, for state B, we get 
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Again, the condition for improved accuracy from adjustment is violated. 


F and N’s ‘‘counterexample”’ says nothing about our second analytical result. However, 
it is useful for numerically illustrating our third and clearly most important result. According 
to that result, when blacks are most heavily undercounted where they are /east prevalent and 
whites are most heavily undercounted where they are most prevalent, synthetic adjustment may 
not improve the accuracy of the proportionate distribution. In F and N’s example, state A has 
a higher black undercount than state B (50% versus 17%) but proportionately fewer blacks 
(2% versus 12%). State A has a higher white undercount (smaller overcount) than state B 
(— 1% versus — 2%) and proportionately more whites (98% versus 88%). Therefore, Fand N’s 
finding that the adjusted estimates in their example are less accurate overall than the census 
estimates, although not guaranteed, is not surprising in light of our third analytical result. 

F and N’s critique of our algebraic analysis of the effects of adjustment is based on a highly 
selective reading of our article that misrepresents our findings. F and N’s criticism of our second 
analytical result is wrong as is their characterization of that result as central to our article. Our 
third analytical result is by far more important. It helps to expose those conditions on which 
adjustment’s success or failure depends. Based on available empirical evidence cited below, 
the conditions of F and N’s numerical example did not prevail in 1980, and our result suggests 
that synthetic adjustment would have improved the accuracy of the geographic distribution. 


3. Simulation Results 


As noted before, the purpose of our simulations was to answer several questions pertinent 
to synthetic adjustment and its effects on the accuracy of population estimates. The central 
questions addressed in our article were: 


¢ How often would synthetic adjustment improve the accuracy of population estimates? 
¢ How much would synthetic adjustment typically improve the accuracy of population 
estimates? 
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¢ Do the effects of synthetic adjustment on accuracy depend on how much census coverage 
varies from state to state? 

¢ Do the effects of synthetic adjustment on accuracy depend on how well we measure national 
undercounts? 


Fand N focus on the second question. For the most part, we agree that the average magnitude 
of improvement in accuracy from synthetic adjustment is modest if our conservative assump- 
tions about the state of the nation pertain. Under Case 22 of Scenario I, which probably 
exaggerates interstate variations in census coverage but is presented in S and P as our 
““moderate”’ variation case, the average reduction in the weighted sum of squared errors is just 
8% while the average reduction in the unweighted sum of absolute errors is only 4%. It is 
important to understand, however, that larger improvements could be realized, as suggested 
by our third analytical result presented in S and P. The gains in accuracy would be somewhat 

‘greater, for example, if Hispanics had the same national undercount as blacks and were included 
with blacks instead of whites. In that case, the average reduction in the weighted sum of squared 
errors would be over 12%. The gains in accuracy would also be greater if black undercounts 
were higher in states with proportionately more blacks. We will return to this point shortly. 
Of course, improvements from synthetic adjustment might be smaller if there were substantial 
errors in measuring undercounts, although as we showed in S and P, the effects of measurement 
error are generally small. 

What is easily forgotten in assessing the average gain in accuracy is the likelihood of realizing 
some gain, large or small. F and N are guilty of this oversight. Under the assumptions of 
Case 22, Scenario I, the likelihood of a gain in accuracy, according to the weighted sum of 
squared errors criterion, is 84%. We are impressed by this finding. Some improvement, perhaps 
only modest, is highly likely. 

This result and the result on the average magnitude of improvement raise critical questions. 
What is the implication of the average improvement being ‘‘only modest’’? Does the average 
improvement have to be overwhelming to justify adjustment? Put differently, should adjusted 
estimates be held to a higher standard than census estimates? The secretary of commerce 
imposed a higher standard in making the 1990 adjustment decision. How would the Census 
Bureau’s coverage improvement and imputation procedures fare by an equally high standard? 
We suspect that some would not fare well, having almost certainly exacerbated rather than 
ameliorated the differential undercount. Finally, would adjustment be recommended if it did 
little to improve accuracy but reduced systematic inequity? We will return to this last question 
in Section 4. 

F and N answer these questions - which, by and large, do not have statistical answers - only 
implicitly, if at all. They suggest, however, thai adjustment might be attractive (its estimates 
“‘will be good’’) if the assumptions of our paper hold, the issue to which we now turn. 

F and N wrongly characterize both the synthetic method and our simulation model. The 
underlying assumption of the synthetic method is not that there is no systematic geographic 
variation in undercounts for a given race but that there is no variation at all. Our simulation 
model shows how synthetic adjustment performs when this synthetic assumption is violated. 
We considered cases of extreme, albeit nonsystematic, interstate variation in undercounts by 
race, as well as cases with more moderate random variation. We did not construct true popula- 
tions ‘‘on the basis of the synthetic assumption,”’ and our ‘‘definition of truth’’ did not ‘‘favor 
synthetic adjustment.’’ As we showed analytically in S and P, synthetic adjustment would have 
been favored by assuming a positive association between the black undercount and the 
prevalence of blacks or a negative association between the white undercount and the prevalence 
of whites. (A precise statement of the result is contained in the appendix to S and P.) 
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For purely illustrative purposes, we assumed in a new round of simulations that white 
undercounts are generated according to the assumptions of Scenario I, Case 22 in S and P but 
that the expected black undercount rises with the state’s proportion black such that the black 
undercount is 2.0% when the proportion black is 11.7% (the national proportion black in 1980 
according to the census) and 5.2% when the proportion black is 20.0%. Under those conditions, 
which we do not claim to be realistic although they preserve the average simulated differential 
in national undercounts, synthetic adjustment improves the accuracy of the proportionate 
distribution according to the weighted sum of squared errors criterion in all 1,000 iterations. 
The average reduction in the weighted sum of squared errors is over 17%, despite extreme 
variation in state total undercounts. 


Do the assumptions of S and P pertaining to interstate variations in undercounts hold? 
Probably not. Although one of our purposes was to simulate a wide range of circumstances, 
it is very likely that our assumptions tended to put adjustment at a disadvantage. 


Did we ‘‘offer no evidence’’ on the matter of geographic variation, as F and N claim? No, 
although admittedly there was not a wealth of information available. For judging our assump- 
tions and their implications, there are two relevant empirical issues: whether variation is 
systematic or random and the extent of variation. We addressed both in S and P. 


As we noted in S and P, according to the 1980 PEP blacks are hardest to count where they 
comprise large proportions of the population. In contrast, there is essentially no relationship 
between the white undercount and the relative prevalence of whites. (Ericksen and Kadane 1983). 
These conclusions are based on broad categories measuring racial composition and data for 
Standard Metropolitan Statistical Areas and state remainders, not state-level data. The only 
published undercount estimates by state and race are the ‘‘Developmental Estimates’’ for 1970. 
Although seriously flawed, based on heroic assumptions about internal migration (Wolter 1987), 
those estimates imply a direct relationship between the black undercount and the prevalence 
of blacks and a weak inverse relationship between the white undercount and the prevalence 
of whites. By ignoring either pattern of covariation, the simulations in S and P tend to understate 
the gains in accuracy from synthetic adjustment. 


Since writing S and P, we have obtained unpublished state population and undercount 
estimates by race from the 1980 PEP. Because the raw black undercount estimates are imprecise 
for several states, it is not clear whether blacks are hardest to count - at the state level - where 
they are most prevalent. For whites, although there is evidence of a direct, rather than an inverse, 
association between their prevalence and the undercount, we believe that this is attributable 
to the inclusion of Hispanics in the white population and, to a much smaller degree, to the 
relatively heavy reliance on the conventional method of enumeration in a few predominately 
white states in the western U.S. Indeed, we find that if the true 1980 population followed the 
pattern of either the Series 2-9 or 10-8 estimates, a synthetic adjustment for the differential 
between the undercount of blacks and Hispanics and the undercount of all other persons would 
almost certainly have improved accuracy. 


The available empirical evidence generally suggests that geographic variations are, if not 
random, systematic with a pattern that would enhance the gains in accuracy from synthetic 
adjustment. It seems unlikely that there is a strong inverse association across states between 
the black undercount and the prevalence of blacks or a strong direct association between 
the white undercount and the prevalence of whites. (Even if one or both of these patterns 
existed, adjusted estimates might still be more accurate, as we showed in S and P.) Thus, our 
assumption of randomness in S and P was probably conservative, working against synthetic 
adjustment. 
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Our assumptions about the extent of interstate variations in undercounts were also 
probably conservative, as we discussed in S and P with reference to the 1970 Developmental 
Estimates. Due to substantial sampling errors for many states, the unpublished state under- 
count estimates by race from the 1980 PEP do not reliably reveal how much black under- 
counts vary across states. The variance in black undercounts calculated across all 51 states 
is 0.0128 for Series 2-9, twice the highest assumed value in our simulations. The variance falls 
to 0.0036, not even midway between the moderate and high variances simulated, when New 
Hampshire (black undercount equal to — 60%) and Vermont (black undercount equal to 
— 24%) are excluded. (For Series 10-8, the interstate variance is nearly equal to the moderate 
value simulated in S and P, if three states with extreme (and highly unreliable) undercounts 
- less than — 20% or greater than 20% - are excluded.) Raw estimates of state undercounts 
for whites from the 1980 PEP are far more precise. The interstate variances for Series 2-9 
- and 10-8 are just slightly below the moderate value simulated. The gains in accuracy under 
S and P’s Scenario I, Case 12 (high variation among black undercounts and moderate varia- 
tion among white undercounts) differ little in frequency or magnitude from the gains under 
Case 22 (moderate variation for both black and white undercounts), where improvements are 
highly likely. 

From published PEP estimates for 1980, we can only calculate variances in total state under- 
counts, not differentiated by race. The largest interstate variance among the 12 published PEP 
Series is 0.00034, slightly less than the average simulated variance for our moderate variation 
case (Case 22). For Case 32 (low variation among black undercounts and moderate variation 
among white undercounts), the average simulated variance is 0.00031, about equal to the 
interstate variance for PEP Series 2-9, which is favored by Ericksen, Kadane and Tukey (1989) 
and is the median variance across the 8 PEP Series remaining after excluding 10-8, 14-8, 14-9, 
and 14-20. Synthetic adjustment reduces the weighted sum of squared errors by about 12% 
on average under Case 32, compared to 8% for Case 22. Case 23 (moderate variation among 
black undercounts and low variation among white undercounts) implies an average simulated 
variance only slightly greater than the variance for PEP Series 10-8, F and N’s ‘‘favorite.”’ 
Under the conditions of Case 23, synthetic adjustment reduces the weighted sum of squared 
errors by 19% on average. Adjusted estimates are more accurate over 92% of the time according 
to that error criterion. Are such improvements ‘‘only modest’’? 


4. Accuracy and Equity 


We have argued before, in S and P and in Schirm (1991), that the foremost concern of statisti- 
cians and demographers should be the accuracy of population estimates. Yet, in a single-minded 
pursuit of statistical accuracy, it is easy to forget considerations of political equity. 

A more accurate population distribution is probably more equitable, in general. However, 
this does not imply that two equally accurate distributions are equally equitable. Although 
adjustment may do little to improve overall accuracy in a particular year, it may reduce or 
remove certain systematic errors and systematic inequity, errors and inequity associated 
with race. 

An example, obtained from our simulations, is displayed in Table 1. The implied black and 
white national undercounts are 5.2% and — 1.1%. The adjusted population estimates in Table 1 
were obtained using these figures and the synthetic method. 

As will become clear, it is hard to draw a sharp distinction between accuracy and equity. 
For this discussion, we assume that accuracy is narrowly defined in terms of the proportionate 
geographic distribution. 
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State 


Alabama 
Alaska 
Arizona 
Arkansas 
California 
Colorado 
Connecticut 
Delaware 


District of Columbia 


Florida 
Georgia 
Hawaii 

Idaho 

Illinois 
Indiana 

Iowa 

Kansas 
Kentucky 
Louisiana 
Maine 
Maryland 
Massachusetts 
Michigan 
Minnesota 
Mississippi 
Missouri 
Montana 
Nebraska 
Nevada 

New Hampshire 
New Jersey 
New Mexico 
New York 
North Carolina 
North Dakota 
Ohio 
Oklahoma 
Oregon 
Pennsylvania 
Rhode Island 
South Carolina 
South Dakota 
Tennessee 
Texas 

Utah 
Vermont 
Virginia 
Washington 
West Virginia 
Wisconsin 
Wyoming 


Total 
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A Numerical Example: Population Counts (1,000s) 


True 


21,950 


Note: ‘‘White’’ includes all nonblacks. 


Table 1 


White 


2,849 
Cie 
2,606 
1,916 
22,105 
2,119 
2,875 


197,836 


Black 


26,495 


Census 


White 


2,898 
388 
2,643 
bpd a A 
21,849 
2,788 


200,054 


Adjusted 
Black White 
1,051 2,866 
15 384 
79 2,614 
395 1,891 
1,919 21,607 
108 2757 
229 2,859 
101 492 
474 187 
L Aly 8,310 
1,546 3,954 
18 938 
3 931 
1,768 9,644 
438 5,019 
44 2,840 
133 Ze 
Ie) is; 3,364 
1,306 2,935 
3 1,110 
1,011 3,273 
25a 5,455 
1,265 7,974 
56 3,978 
936 1,616 
542 4,354 
Js 776 
51 1,505 
54 741 
4 907 
976 6,369 
25 1,265 
2535 14,988 
1,392 4,512 
3 643 
1,136 9,613 
216 2,789 
39 Ge 
1,105 10,697 
30 909 
1,001 2,149 
ps 681 
766 3,822 
1,804 12,380 
9 1,436 
l 504 
1,065 4,290 
142 3,981 
69 1,864 
193 4,473 
3 462 
27,956 197,838 
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Table 2 


A Numerical Example: Congressional Apportionments 


State 
True 


Alabama 
Alaska 
Arizona 
Arkansas 
California 
Colorado 
Connecticut 
Delaware 

' District of Columbia 
Florida 
Georgia 
Hawaii 

Idaho 

Illinois 
Indiana 

Iowa 

Kansas 
Kentucky 
Louisiana 
Maine 
Maryland 
Massachusetts 
Michigan 
Minnesota 
Mississippi 
Missouri 
Montana 
Nebraska 
Nevada 

New Hampshire 
New Jersey 
New Mexico 
New York 
North Carolina 
North Dakota 
Ohio 
Oklahoma 
Oregon 
Pennsylvania 
Rhode Island 
South Carolina 
South Dakota 
Tennessee 
Texas 

Utah 
Vermont 
Virginia 
Washington 
West Virginia 
Wisconsin 
Wyoming 


Total 435 
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Although the census and adjusted distributions in Table 1 are equally accurate, according 
to a weighted sum of squared errors criterion, the adjusted estimates more accurately reflect 
the racial distribution at the national level and are more equitable. (The census geographic 
distribution is slightly more accurate according to a sum of absolute errors standard.) The true 
and adjusted figures imply that 12.4% of the U.S. population is black. According to the census, 
only 11.7% of the population is black, a serious inequity. 

The equity gains from adjustment are made more concrete by the implied congressional 
apportionments shown in Table 2. Both the census and adjusted estimates allocate one too 
many seats to lowa and Massachusetts and one too few to California and Virginia. However, 
the census estimates also allocate one too many seats to Colorado and New York and one too 
few to Alabama and Georgia, whereas the adjusted estimates allocate the correct numbers of 
seats to these states. Based on the census figures, Alabama and Georgia are denied represen- 
tation because of their high proportions black and the differential undercount to which blacks 
are subject. Adjustment substantially improves equality of representation. Based on the true 
population figures and the census apportionment, there are 471,000 persons per representative 
in Colorado, 508,000 in New York, 555,000 in Georgia, and 558,000 in Alabama. Adjustment 
narrows the differences, with 565,000 persons per representative in Colorado, 524,000 in New 
York, 505,000 in Georgia, and 489,000 in Alabama. For the four states combined, there are 
519,000 persons for each of the 57 representatives. (For the entire U.S., there are 519,000 
persons for each of the 435 representatives.) Adjustment reduces the (unweighted) root mean 
square deviation from this average by over 21%. (The reduction is between 20% and 21% when 
deviations are weighted by the number of persons per representative according to the true 
population figures.) The equity gain from adjustment is also clearly revealed by the weighted 
average of persons per representative calculated across all 50 states. When weighted by the 
proportion of the national black population (exclusive of the District of Columbia) living in 
the state, the true average number of persons per representative is 518,000. If Congress is appor- 
tioned according to the census estimates, the average is 524,000. Synthetic adjustment removes 
most of this racial inequity. The average number of persons per representative is 520,000 when 
House seats are allocated according to the adjusted estimates. Although there are surely still 
other ways to measure inequality of representation, it is hard to imagine a reasonable alternative 
that would not show adjustment reducing the racial inequity attributable to differential census 
undercounting. The gain in equity in this example is achieved despite no gain in accuracy of 
the proportionate distribution across states. 

The errors in the census are systematic. After adjustment, the remaining errors may not 
be truly random, and so long as there are errors, there will be inequity. However, the source 
of those errors would be far less offensive than race. 


5. Discussion 


In criticizing S and P and Ericksen, Kadane and Tukey (1989), F and N emphasize the role of 
assumptions underlying adjusted estimates. Their view, however, is extreme, counterproductive, 
and fundamentally flawed. 

Although it is reasonable - and necessary - to ask whether assumptions matter, assumptions 
do not have to be exactly true as F and N imply. Moreover, proponents of adjustment should 
have to defend it against only reasonable alternative assumptions. F and N seem to believe 
that almost any alternative is fair game. As in assessing the magnitude of improvement, they 
require adjustment to bear a heavy burden of proof with no scientific justification. Nonetheless, 
as our simulations show, synthetic adjustment improves accuracy even under extremely 
unfavorable - and probably unreasonable - assumptions. 
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This raises another important point of disagreement between us and F and N. Not all 
assumptions have to imply precisely the same estimates if all adjusted estimates based on 
reasonable assumptions are more accurate than census estimates. Unless equally plausible 
assumptions have very different implications, we should not reject the better for failure to find 
the best. We should not settle for census estimates that are less accurate. 

F and N offer nothing to suggest that census estimates are more accurate than adjusted 
estimates. The legal findings on which they rely are no basis for a scientific argument. More- 
over, despite Schirm’s (1991) finding that the judgmental decisions made in producing census 
estimates can affect congressional apportionment, F and N fail to scrutinize census procedures 
and the underlying assumptions. Do they find plausible the census ‘‘assumption’’ that when 
the final estimates are released, everyone everywhere has been counted in exactly the correct 
location? If not, do they have any constructive suggestions for improving the accuracy of 
' population estimates? Unfortunately, their critical commentary on those suggestions that 
have been offered is seriously flawed by misrepresentation and distortion and offers nothing 
constructive. 

‘‘Should we have adjusted the census of 1980?’’ as F and N ask. Maybe, maybe not. 
Although it is subject to debate, we may not have known enough about the likely effects of 
adjustment or been technically and operationally prepared to undertake an adjustment at the 
time a decision had to be made. Would adjustment have improved accuracy in 1980? We cannot 
answer with certainty because the true population is inherently unknowable and anomalies 
cannot be entirely ruled out. With that qualification, the answer is ‘‘very likely’’. 
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COMMENTS 


J.A. HARTIGAN! 


The Adjustment Controversy 


Each ten years the United States Census prepares a list, or enumeration of names and 
addresses of persons resident in the United States. The list is subject to error in that persons 
may be omitted from the list, or erroneously included on it. In the 1990 census, Ericksen ef al. 
(1991) estimated there to be 13 million erroneous enumerations and 17 million omissions. Even 
if these estimates are off by a factor of two, it appears that the enumeration needs some 
adjustment. 

Freedman and Navidi discuss statistical evidence presented in a law suit intended to force 
the Bureau of the Census to adjust the 1980 enumeration. The origin of the law suit is the 
differential undercount between races. The undercount is perhaps 5% for Blacks and Hispanics, 
and 1% for Others. Since the undercount is greater for minorities, those localities with larger 
fractions of minorities press for an adjustment in the census figures that would adjust for the 
undercount. The undercount has been established by Demographic Analysis (counting births, 
deaths, emigration, and immigration by race, sex, and age) in censuses since 1940, and by Post 
Enumeration Surveys, surveys to obtain a more accurate count in a sample of the population, 
since 1970. The size of the undercount is an important point of dispute, since it makes more 
sense to estimate and correct for a large differential undercount than for a small one. Freedman 
and Navidi concede that there may be a differential undercount, but assert, at least for the 
1980 census, that the undercount is not sufficiently well estimated in different localities to make 
adjustments feasible. Freedman and Navidi are concerned mainly to criticize proposed tech- 
niques for doing the adjustment; what are their own estimates of the undercount? For example, 
do they agree that the national undercount is as high as 5% for Blacks and Hispanics compared 
to 1% for others? I will argue later that if the differential undercount is that high, a synthetic 
adjustment (each minority person weighted 1.05, each majority person weighted 1.01) will 
probably improve estimates of state population shares. 

In 1980, the Bureau conducted a Post Enumeration Program which it intended to use in 
adjustment. The bureau decided not to adjust, on the grounds that the PEP estimates were 
not sufficiently accurate or reliable to give improved counts in small localities. This paper 
reprises Freedman’s testimony in the court case which followed, in which the court’s decision 
supported the Bureau of Census. It may be of interest to report some of the later developments, 
which show that the issues raised in the present paper are still very much alive. In the 1980’s 
the Bureau planned a more substantial Post Enumeration Survey for the 1990 Census. A dress- 
rehearsal PES was run in 1988. Some 20 evaluation studies to handle various types of error 
in the PES were planned and carried out after the 1990 Census. In 1988, the Secretary of 
Commerce announced that there would be no adjustment of the 1990 census. The government 
was sued by various localities with high fractions of minorities. The secretary then agreed, on 
17 July 1989, to continue planning for the PES, to appoint a committee of 8 experts who would 
advise the secretary on the feasibility of adjustment, and to publish a set of guidelines under 
which the Census enumeration would be adjusted or not. The external committee met frequently 
with Census officials, and advised them on planning , execution and analysis of the PES. The 
evaluative analyses were carried out by the bureau, and on 21 June, 1991, the steering committee 
in the census, with some dissent, recommended to the secretary that the census be adjusted. 


! J.A. Hartigan, Department of Statistics, Yale University, New Haven, CT USA 06520-2179. 
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The recommendation of the committee was considerably weakened a few days later when an 
earlier analysis was found to be in error. The external committee divided into two groups of 
four. The first of these, Ericksen, Estrada, Tukey, and Wolter, with the aid of many 
consultants, wrote an extensive report that found many defects with the original enumeration, 
and strongly urged adjustment. The second group of four, Kruskal, McGehee, Tarrance, and 
Wachter, are as strongly opposed to adjustment. Wachter, with the help of some consultants, 
offers alternative statistical analysis of the PES that suggest the range of plausible adjustments 
is so wide as to have quite different effects on reapportionment and other distributive 
requirements of the Census figures. The secretary decided that the statistical foundation for 
adjustment was inadequate and recommended against adjustment. The Department of 
Commerce was sued by the same localities that sued in 1980. The 1980 court case is thus being 
replayed after the 1990 census. 


Synthetic Adjustment 


A simple synthetic scheme is to multiply each minority person actually enumerated by 1.05 
and each majority person actually enumerated by 1.01. I agree with Freedman and Navidi’s 
rejection of Schirm and Preston’s (1987) analytic argument. 

What about the following analytic argument? Suppose that national undercounts are 
correctly estimated, but the undercounts differ over states; when does the synthetic adjustment 
improve the estimate of a state’s proportion of the national population? The answer is, if the 
synthetic adjustment is an overadjustment for a particular state, it is closer to the true proportion 
than the enumerated proportion if and only if the minority fraction in the state is less than 
the national minority fraction; conversely, if the adjustment is an underadjustment for a 
particular state, it is closer to the true proportion than the enumerated proportion if and only 
if the minority fraction in the state is greater than the national minority fraction. It is plausible 
to expect the undercount for minorities and non-minorities in high minority states to be higher 
than in low minority states, which would cause the synthetic adjustment to be an under- 
adjustment, but nevertheless to be an improvement on the enumerated proportion. 

National undercount rates of 5% and 1% are supported by historical evidence from the 
Bureau, both by demographic analysis and post enumeration surveys, Tables | and 2. Ina 5-1 
adjustment, we multiply non-minority enumerations by 1.01 and minority enumerations by 
1.05. Will this improve apportionment of Congressional seats to the various States? 

The actual minority and non-minority populations in the different States are unknown. We 
are comparing the two estimates of the State populations based on the unadjusted and adjusted 
Census. The census will do best when the States with high minority populations actually have 
alow differential undercount; the 5-1 adjustment will then overshoot the true proportions for 
those states. Correspondingly, if the States with low minority populations actually have a high 
differential undercount, then the 5-1 adjustment will undershoot the true proportions. This 
tells us how to construct a best case for the census, and a worst case for the adjustment. 

I will ignore variations in the non-minority undercount between States, as these should have 
a minor affect on the overall proportions; I will suppose that all States have a non-minority 
undercount of 1%. Suppose that the true overall minority undercount is 5%. Suppose this 
undercount might vary from 3% to 7% in the different States. I assign 3% undercounts to 
high-minority states, and 7% undercounts to low-minority states, with the division between 
high and low minority being decided so that the overall minority undercount is 5%. This 
assumption of true undercounts makes the census look best. The calculation is done for a range 
of choices of overall undercount and variations across states. 
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Table 1 


Historical Estimates of the Amount and Percent of Net Undercount by Race, 
as Measured by Demographic Analysis 
(Report dated 21 June 1991 from the Bureau of Census undercount steering committee) 


1940 1950 1960 1970 1980 1990 

Total 5.4 4.1 BA Dil hd 1.8 

Black 8.4 TS 6.6 6.5 4.5 aa 

Non-black 5.0 3.8 act 22 0.8 les 
Table 2 


Undercount Estimated by the Post Enumeration Survey and Demographic 
Analysis in the 1990 Census, by Age, Race and Sex 


Black Non-black 
Male Female Male Female 

PES DA PES DA PES DA PES DA 

5.4 8.5 4.3 3.0 2.0 2.0 1.4 0.6 

0-9 8.0 8.2 7.8 7.8 3.3 Dah 3.4 2.8 

10-19 4.0 2.0 4.0 Dep) 2 —1.0 1.8 —0.5 
20-29 6.4 9.4 6.8 3.8 5.0 Del 3.8 0.9 
30-44 5.9 12.4 3.9 Dies DD Dal 1.4 0.1 
45-64 32 ihe 7 iS 0.5 0.4 2.8 —0.5 0.4 
654+ a0) 3.0 —0.3 —1.3 —0.9 1.4 —1.1 0.4 


The census and adjusted estimates are compared by computing the number of congressional 
seats that are wrongly apportioned by the two estimates of population. The number of seats 
allocated to a State with 7.2% of the population is 435 x 7.2 suitably rounded. The rounding 
makes the actual apportionment a rather poor measuring rod for comparing two methods, 
because the misapportionment is usually only 1 or 2 seats. Instead, I will use fractional 
misapportionment, which is half the sum of absolute differences between the estimated and 
true proportions, multiplied by 435. 

It can be seen from Table 3, that the break - even point for census versus adjusted occurs 
when the true overall minority rate is 3%; we expect this, because then the true differential 
undercount is 2%, half-way between the 0 rate implied by the census, and the 4% rate assumed 
by the adjustment. For higher overall minority undercounts, the census does better only when 
there is a big range of variation across states, and the states with high minority populations 
happen to have low undercount rates. For example, if the overall rate is 4%, the census achieves 
0.8 misapportionment against 1.2 for the adjustment, provided that all the high-minority states 
have a 2% undercount, and all the low-minority states have a 6% undercount. If the overall 
rate is 5%, the census achieves 0.8 versus 1.0, only if the high-minority states have a 3% 
undercount and the low-minority states have a 7% undercount. If the overall rate is greater 
than 5%, the census is better than the adjusted for no combination of undercount rates in the 
states having a range of 4% or less. 
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Table 3 


Comparison of Fractional Misapportionment for the Census and a 5-1 Adjustment 
for a Range of Overall Minority Undercount Rates, with Varying Undercount Rates for the States 
(The majority undercount rate is fixed at 1%; the minority populations in each state 
are estimated from the U.S. statistical abstract, 1989) 


Minority Minority 
Overall undercount undercount Census 5-1 
minority for low- for high- fractional fractional 
undercount minority minority misapportionment misapportionment 
states states 
2 2 2 0.2 0.7 
2 3 1 0.4 Lal 
2 1 3 On 0.6 
3 3 3 0.5 0.5 
3 4 D 0.4 0.9 
3 2 4 0.9 0.4 
4 4 4 0.7 0.2 
4 5 s 0.6 0.7 
4 3 5 le} 0.4 
4 6 2 0.8 1.2, 
4 Z 6 1.6 0.9 
5 5 5 0.9 0.0 
5 6 4 0.8 0.5 
> 4 6 1.3 0.5 
5 7 3 0.8 1.0 
5 3 Ji 1.8 1.0 
6 6 6 12 0.2 
6 a 5 1.0 0.4 
6 2) vs 1.6 0.7 
6 8 4 1.0 0.9 
6 4 8 2.0 led 
ry yi 7 1.4 0.5 
i, 8 6 Po 0.4 
7 6 8 1.8 0.9 
7 9 > J 0.8 
7 5 9 WD) 1.4 


Table 3 suggests that the overall rates would need to be 3% or less to make this crude 5-1 
rule less accurate than the census for apportionment. The 1990 PEP-based 95% ‘confidence 
intervals’ for the overall minority rate are 4.3 to 5.7; this range seems overoptimistically narrow, 
but even if we doubled the quoted margins of error, the interval is 3.6 to 6.4; if the true value 
lies in this range, then the 5-1 rule will still beat the census. 


PEP and PES Based Adjustments 


The 1980 PEP survey and the 1990 PES survey are designed to refine a synthetic adjustment 
by estimating different undercount rates in different localities. Freedman and Navidi are 
skeptical about the regression used to smooth the estimates, questioning independence, 
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homogeneity of variance, and the reliability of the selection procedures for including variables 
in the model. I was not persuaded by their examples of how different sets of variables could 
easily have been selected for inclusion in the model; after all, if the predicting variables are 
highly correlated, quite different subsets can produce pretty much the same prediction. Thus 
the fact that different variables were selected does not indicate the smoothed estimates would 
be very different. Indeed, their table 10 indicates that two variables, percent minority, and 
percent conventionally enumerated appeared in nearly all equations. I would suspect that the 
assumptions of the regression can not be easily defended, but that the results of the regression 
are reasonable, except perhaps in producing lower standard errors than are justified by the 
probable lack of independence. 

Reduction in sampling variance by regression-based smoothing procedures is not likely to 
make much difference to estimates in large localities such as States. There, the aggregation 
of different PEP or PES estimates is already doing as much smoothing as is needed, and the 
questionable regression assumptions can be avoided. On the other hand, the regression 
smoothing probably is needed if results are to be projected to small localities. 

I agree with Freedman and Navidi that missing data procedures and bias assessment in the 
post surveys are the key to evaluating the adjusted estimates . Correct handling of missing data, 
and assessing bias, requires an intimate understanding of survey procedures. Personal 
judgements by the professionals most closely involved will dominate the conclusions. A healthy 
skepticism about any resulting ‘standard errors’ or ‘confidence intervals’ is justified. 

I suggest that the the right loss function to evaluate accuracy in apportionment is not squared 
error, which is statistically convenient for combining variances and squared biases, nor estimates 
of the numbers of states or localities that are better estimated by the adjustment. For appor- 
tionment, the loss function should be the sum of absolute differences between estimated and 
true proportions in the different states, because this represents the numbers of people actually 
misallocated by the estimates, and corresponds at the state level to the number of misappor- 
tioned seats. 

Although state proportions are of primary interest, let’s look at the state populations first. 
If the true undercount rate in a state is 2%, then the census is better than the estimate just when 
the estimated undercount rate is less than 0% or greater than 4%. This occurs with probability 
50% when the standard error of the estimate is about 3%; thus even a quite inaccurate estimate 
of undercount is enough to give the adjusted estimate the edge. The census has the same expected 
difference from truth as the estimate when the standard error is 2.2, and the same expected 
square difference when the standard error is 2. Out of all this comes the simple rule, that if 
the true rate is 2%, you do better than the census if you can estimate the true rate with standard 
error 2%. When estimating population proportions, rather than populations, the relevant 
computations are on the differential undercounts for the various states, the difference between 
the undercount for the state and the nationwide undercount, (not the difference between the 
races); thus adjustments do better than the census in those states where the true difference 
between the state undercount rate and the nationwide undercount rate exceeds the standard 
error of the estimated difference. 

Under this rule, and accepting the bureau’s 1990 estimates of undercount rates and margins 
of error based on the Post Enumeration Survey, the enumeration is estimated to do better in 
24 out of 50 states in estimating proportions. Note however, that the overall estimated loss 
is quite a bit better for the adjustment than the census, because the states with large (plus or 
minus) differential undercount rates are estimated better by the adjustment; when the census 
does better, it does just a little better; when the adjustment does better it often does quite a 
lot better. Thus the fact that 24 out of 50 states are not estimated to be improved by adjustment 
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should not cause too much excitement; it just means that a lot of states have an estimated 
undercount rate that is pretty close to the national average, and there is no advantage, and 
not much difference, in adjusting them. We continue to estimate that substantially more people 
are correctly allocated by the adjusted figures than the enumeration. Table 4 gives some error 
estimates when the census PES based figures are incorrect in various ways. 

The bureau has produced a number of estimated undercounts, with margins of error, in 
the various states. I use the ‘selected PES method’ (called PES from now on) in the report of 
the Undercount Steering Committee, 21 June 1991. Now there are two popes, the enumeration, 
and the PES figures. Which is correct? Well, you need a third pope, an infallible one, to decide. 

We don’t have the third pope. The various follow up evaluations of the PES, the total error 
model, the loss function analyses, the robustness analyses, are all attempts to feel out what 
the third pope might decide, but an attitude of skepticism and caution is necessary in believing 
’ the decisions of the fictional third pope. In particular, the bureau’s ‘true population’ estimates 
are all variations on the PES estimates, accepting the basic accuracy and feasibility of the PES, 
and so most unlikely to find the PES at fault compared to the enumeration. The PES can only 
be found inferior by some method that is not so closely linked to it. Demographic analysis is 
by no means in complete agreement with the PES, and provides only national information, 
but on the whole, it supports the PES rather than the enumeration. 

I have done some sensitivity analyses to evaluate how far the PES estimates and margins 
of error would have to be in error for the census to look competitive. The calculations use the 
PES state undercount estimates with various multipliers, the PES margins of error with various 
multipliers, and assume that the true state figures are sampled from normal distributions with 
the multiplied PES state undercounts and margins of error. I take the different state truths 
to be independent, which is surely far from correct. The independence will not seriously affect 
the individual state proportions though, so the average misapportionment of the census and 
the PES won’t be much affected; the variability of the difference will be underestimated. 

The results in Table 4 show that the PES estimates have to be in substantial error before 
the CENSUS starts to be competitive. Accepting the PES rates and margins of error, the 
CENSUS misapportions 4 seats, the PES 1. If the PES overcount rates are halved, with the 
margins of error remaining fixed, then the misapportionment rate for the census is 2.5 seats, 
and for the PES 1.5 seats, and the census will be better in about 40% of the true cases. 

However this analysis is in line with the loss function analyses in that it takes the PES as 
its starting point. 


Table 4 


Misapportionment of the Enumeration and PES-adjustment, when the True Figures 
are in Accordance with the PES-undercount Rates and Margins of Error, 
with Various Multipliers for the Undercounts and Error Margins 
(Based on 100 simulated true counts.) 


Multiplier for Multiplier for Census Adjusted Standard 
pes undercount in margin of error misapportioned misapportioned deviation 
each state in each state seats seats of difference 

1 1 3.8 1.1 2 

1 2 3.7 2.0 33 

0.5 0.5 2.8 bee 13 

0.5 1.0 pis) 1.6 1.4 

0.75 0.75 33 0.9 123 

0.75 1:50 3.3 1.6 3 
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I wonder if it might not be useful to make a distinction between the enumeration, that lists 
names and addresses, the count, that counts the number of people in various localities according 
to the list, and estimates, using statistical procedures based on various sources of information 
such as demographics and supplementary lists. This would be perhaps politically divisive, since 
interested parties would wish to allocate according to the figures most favourable to themselves. 
There would be the danger that the census professionals, with several estimates available, would 
be subject to political pressure to choose estimates favourable to one or other group. If we 
want to estimate populations accurately though, it is a good principle to base the estimates 
on several mutually supporting surveys rather than a single one. The danger of relying on a 
single list outweights the dangers and difficulties in combining information from different 
sources. For one thing, the only way to find out how accurate a survey is to compare it to another 
survey in one form or another. It should be noted that even in the ‘unadjusted’ census, population 
estimates are not simple counts off an enumerated list. Individuals known to be fictitious are 
included in the count by various kinds of imputation procedures that handle missing data. 

Perhaps Freedman and Navidi are right, in asserting that the 1980 PEP figures were too 
unreliable to permit their use in adjusting the census figures. Perhaps they were right in saying 
that a nationwide synthetic adjustment is too crude, and has not been demonstrated to improve 
accuracy. Yet omissions and erroneous enumerations in the tens of millions suggest that some 
kind of adjustment might improve accuracy. There is plenty of room for improvement. We 
could misguess the existence or location of a few million people and still be competitive with 
the raw enumeration. 


I have some questions for Freedman and Navidi. 


(1) Do they agree with these estimates of 13 million omissions and 17 million erroneous 
enumerations? 

(2) Is the nationwide differential undercount between Blacks and Whites 4%? 

(3) Is the PES a useful tool for assessing accuracy of the first census? 

(4) Should the PES follow-up sample be used to correct the first census, not only in the specific 
instances of erroneous enumerations and omissions discovered by comparing the surveys, 
but also by projecting differential undercounts discovered in the follow up sample to the 
whole census? If so, how? 

(5) Ifthe PES is not good enough, how should the follow-up survey be designed so that it could 
be used to adjust the census? 
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COMMENT 


T.P. SPEED! 


Freedman and Navidi ask ‘‘Should we have adjusted the census of 1980?’’ and answer no. 
I take this as meaning that they have yet to see compelling evidence that it could have been 
done well, not that they do not see a problem, and not that they think adjustment intrinsically 
undesirable. My interest in census adjustment was aroused about four years ago, shortly after 
I came to the U.S. I have read the papers by the main participants in the debate, and have 
recently had the opportunity to examine some block-level data from the 1990 Post Enumeration 
Survey. My conclusion is the same as that of Freedman and Navidi: there is simply no evidence 
to show that adjustment will work at the level proposed. 

There are two features of the arguments for adjustments that I find particularly striking. 
Absolutely no use is ever made of ‘‘ground truth’’ data to demonstrate clearly that adjustments 
do improve upon the census. And no use is made of available data to justify the key assumptions 
on which adjustment methods are based. 

In 1990 adjustment was to be at the census block level. This would have meant that on the 
basis of a sample of about 12,000 blocks, each of 6.5 million blocks would have been adjusted. 
Adjusting a census block count means adding or subtracting people with specific characteristics 
before further aggregation. This would have been done using procedures based on unverified 
and implausible assumptions concerning the undercount mechanism. The most important such 
assumption is that undercount rates are constant within 1,392 demographic subgroups of the 
population called poststrata, defined by region, race, sex, age and status as a home owner or 
renter. One such consists of all non-black male Hispanic renters aged 30-44 living in Los Angeles 
city, or in central cities in the Pacific Census Division (California, Oregon, Washington, Alaska, 
Hawaii). Another consists of all female owners aged 20-29 living in central cities of 250,000 
or more, excluding New York City in the Mid-Atlantic Census Division (New York, New Jersey, 
Pennsylvania) who are not black, Hispanic, Asian or Pacific Islanders. The parallel with the 
regression models in the present paper is clear. 

Examination of block-level data from the 1990 Post Enumeration Survey from sites in 
Detroit and Texas showed that the assumption of constancy of the undercount rates within 
1,392 poststrata is no better supported than a quite different one: that the undercount is driven 
by blocks, and is constant across poststrata within blocks. This dual model would have led 
to different block-level adjustments. The analysis is difficult because the counts of people in 
the intersections of blocks and poststrata are quite small, heterogeneous, and mostly zero. 
Details can be found in Hengartner and Speed (1992). 

Of course I do not know whether the poststrata-driven or the block-driven undercount model 
is better; we would need something like ‘‘ground-truth’’ data to answer that. But we can see 
that certain key assumptions concerning the 1990 undercount model are no better supported 
by available data than those of a quite different model. In my view, when changing assumptions 
changes the results, and when we have no way of telling which set of results is closer to the 
truth, then we have no business adjusting. This is the message I get from the present paper, 
and it is one I wholeheartedly support. 


ADDITIONAL REFERENCE 


HENGARTNER, N., and SPEED T.P. (1992). Assessing between-block heterogeneity within poststrata 
of the 1990 Post-Enumeration Survey. Submitted to Journal of the American Statistical Association. 
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COMMENT 


EUGENE P. ERICKSEN and JOSEPH B. KADANE! 


“The Court (Judge Sprizzo): I take it your standard error should be a fixed statistical 
number which you then subtract from your results and you get what is left, basically 
which is supposed to measure the accuracy of what you are measuring? 


The Witness (David Freedman): I hate to argue with you, but it isn’t quite like that.”’ 
(Cuomo y. Baldrige: 2629). 


We welcome the opportunity to continue the debate with Freedman and Navidi (F and N). 
Although Judge Sprizzo’s decision is now more than 4 years old, the statistical issues are impor- 
tant ones and deserve continued attention. This is especially true because the final scientific 
judgements are best made by statisticians and demographers, rather than judges and politicians. 
In this article, Freedman and Navidi review their side of the adjustment controversy, explore 
some new arguments, and try to use Judge Sprizzo’s legal decision to support their scientific 
position. In this comment, we reexamine certain critical points, restating and clarifying our 
position where necessary in an effort to demonstrate how adjusting the 1980 Census would 
have made the data more accurate. 

Our disagreements with Freedman and Navidi are fundamental, and we agree that they go 
to the heart of statistical inference. In their conclusion, F and N write “‘success of any of EKT’s 
proposed adjustments rides on unverified and implausible assumptions (p. 19).’’ To the 
contrary, we believe that our assumptions are realistic and verified by decades of census-taking 
knowledge, as we will argue below. For their part, F and N’s arguments boil down to little 
more than concern that some assumptions may not be true. To criticize a statistical argument 
however, it is necessary to do more than that. Assumptions are usually not true exactly - the 
relevant question is how far they are from being exactly true and what that means for the 
intended uses of the data. At a minimum, one must show that other assumptions, argued to 
be just as realistic, or more realistic, lead to substantially different conclusions. F and N do 
none of this. Moreover, although they concentrate upon the minor differences in various 
adjustment possibilities, they make no attempt to demonstrate that the adjustments would result 
in estimates with larger errors than the unadjusted census. 

An important part of the disagreement concerns whether or not it is proper to use what we 
know about the census. F and N give no weight to evidence of greater census-taking problems 
in some areas than others, and give no credit to the fact that the PEP-measured omission rates 
and undercounts are higher in those areas with lower mailback rates, higher rates of missing 
data, and greater problems maintaining the specified long-form sampling rate on the census. 
Nor do they give any credence to the consistency of the racial differentials in undercount 
provided by demographic analysis for every census since 1940. This information is not relevant 
to them, and they are quick to criticize us whenever we rely upon ‘‘unverified’’ assumptions, 
no matter how realistic or warranted. They also do not explain what ‘‘verification’”’ is to them. 

At the same time, Freedman and Navidi were not able to make their own argument without 
reliance upon assertions which are either unverified or are based on the very PEP data which 
they criticize us for using. Here are some examples: 


! Bugene P. Ericksen, Temple University, Philadelphia, PA. USA. and Joseph B. Kadane, Carnegie-Mellon University, 
Pittsburgh, PA USA 15213. 
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. A small undercount is thought to remain in the census (p. 3). 

. The census also had a small amount of erroneous enumerations (p. 4). 

. The undercounts estimated by PEP are likely to be biased upward (p. 12). 

. The eruption of Mt. St. Helens caused correlated error between the original enumeration 
and the PEP (p. 12). 

5. Missing data caused a bias in the PEP (p. 13). 

6. Minority persons living in central cities are likely to behave differently from those in suburbs 

(p. 16). 
7. The undercount in conventional areas was relatively high (p. 18). 


RWN 


F and N seem to believe that in the absence of substantial direct information about the quality 

of the PEP data that we should not adjust since different assumptions sometimes lead to 
_ different results. This argument, however, ignores the well documented errors in the census 
enumeration. We have provided extensive documentation, not just in the EKT article, but 
elsewhere (Ericksen 1983; Ericksen and Kadane 1985) of problems in the census, and others 
(Citro and Cohen 1985; U.S. Bureau of the Census 1985, 1986 and 1988) have found similar 
results. To us, the substantial evidence of census-taking problems, the geographic coincidence 
of census-taking problems with high PEP undercounts, and the consistency of the PEP series 
with each other and with the results of demographic analysis results provide ample assurance 
that the additional information derived from the PEP data could have been used to adjust - and 
improve the accuracy of - the 1980 Decennial Census. This summarizes our general point of 
view. In the sections that follow, we address some of Freedman and Navidi’s specific arguments. 


Do the Simple Adjustments Improve Upon the Census? 


In their Section 2, F and N criticize our Table 5, in which we claim to show the general 
agreement of 14 different adjustment schemes. Each of them shifts population share from 
predominantly White areas outside of cities, where census-taking problems were low, to large 
central cities with substantial minority populations, where census-taking problems were great. 
F and N conclude: ‘‘The table does not show that any of the methods improve upon the census. 
It cannot, because there is no external standard against which to measure improvement’’ (p. 6). 
If what F and N mean is that the ‘‘true’’ population is unknowable, than their argument, of 
course, goes too far and no adjustment could ever meet their requirements. 

In the EKT paper, we relied upon Schirm and Preston (1987) to show that a simple synthetic 
method (our Synthetic B) improved upon the census. Since they are also commenting upon 
Freedman and Navidi’s article, we will not repeat their arguments. Given the improvement 
provided by Synthetic B, we would expect furthei improvement to be provided by more realistic 
assumptions, namely that minority populations would be more difficult to count in areas where 
census-taking problems are greater. These assumptions are consistent not only with PEP results, 
but with the result of a separate Census Bureau study of New York City which showed that 
omission rates were strongly and negatively correlated across district offices with mailback rates 
(Ericksen and Kadane 1986). 

Freedman and Navidi base their argument on the apparent differences in the adjusted 
distributions provided by the different PEP series. We do not believe this evidence to be perti- 
nent, since we know that the eight ‘‘preferred PEP’s,’’ as well as the more reasonable 
Synthetic A, will be different not only from Synthetic B, but also from the four less preferred 
PEP’s. Among the six preferred PEP’s based on April data, the average rms difference is 
0.07%. Differences between these and the two preferred August PEP’s are larger, but we 
explained in our paper why we thought the April and August data were different. More 
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importantly, all the 14 adjustments improve upon the census by shifting population shares from 
areas where census-taking problems were low to areas where they were high. The fact that some 
of the adjustments, e.g. Synthetic B, make only small adjustments is no argument against 
making any adjustment. 

F and N raise some additional questions, each of which we easily dispose of. First, Freedman 
and Navidi appear to disagree with our strategy of incorporating information from several 
sources. However, there was nothing about the sources of information that made combining 
them inconsistent or unusual. Since we start from the proposition that additional information 
is generally a useful thing, we do not find any merit in F and N’s criticism on this point. Second, 
finding that the demographic method does not give a decomposition of the undercount that 
is geographically detailed, they set it aside as if it had no use. For us, the demographic method 
gives at least two important pieces of information. It gives a reliable estimate of the national 
undercount, and it also gives a powerful covariate: Blacks are undercounted more than Whites. 
Neither of these estimates should be taken to be without error, but they certainly give us 
confidence that each of the preferred PEP series coincide with these observations. 

Finally, they found irrelevant our Table 6, which showed that omission rates, relative to rates 
of erroneous enumeration, were high in those areas with high undercounts. Turning to EKT’s 
Table 5, we find it to be consistent not only with Table 6, but with the results of demographic 
analysis. This increases our confidence in the utility of the PEP. The argument is called 
‘“convergent validity,’’ and is commonly made in the social sciences. It should also be noted 
that the series we do not take seriously because of the implausibility of their assumptions, Series 
10-8, 14-8, 14-9, and 14-20, are less coincident with demographic analysis. We find: 


1. Validation of the 8 preferred series, because their national undercount rates and the results 
in areas in which Blacks are concentrated are consistent with the demographic results, and 


2. Evidence that Series 10-8, Freedman and Navidi’s foil, is indeed an outlying series. 


Can We Expect Improvements in Small Areas? 


In our court testimony, we were concerned mainly to show that improvements could be 
expected for the 66 areas defined as PEP sampling areas. In a separate document, Tukey (1983) 
showed that if improvement was to be obtained in larger areas, then it could also be expected 
on average in its smaller components. Since then, both conceptual advances and empirical 
verifications (Ericksen et al. 1991, Appendix H; Wolter and Causey 1991) have been obtained. 


Averaging and Sensitivity Analysis 


Freedman and Navidi assert that ‘‘it is the spread in the PEP series that is interesting, not 
the average - because it is the spread that demonstrates the impact of applying different 
modeling assumptions to the same data (p. 11).’’ We differ from F and N in two ways. First, 
we believe that both the spread and the average are relevant, and we discussed each. Second, 
and more importantly, we used a different measure of the spread, the root mean squared error 
(rmse) instead of the range. F and N give little argument to support their choice of statistic. 
We prefer the rmse because it takes all the data into account, and the squared error feature 
gives extra weight to large errors. We found that ‘‘The root mean squared error among all 792 
residuals is 0.59. In contrast, the root mean square of the 66 area effects is 1.60. The area effect 
is more than double the root mean square residual 47 of 66 times (EKT, p. 938).’’ We also 
showed that when we restricted attention to the “‘preferred eight,’’ that the root mean squared 
residual was 0.33, and that the area effect was more than double the rmse 59 of 66 times. 
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We believe that F and N’s use of the range in Table 6 and Figure 1 is wrong for another 
reason. Even among the preferred April estimates there is some difference among the national 
rates of undercount. If, as F and N say, we are concerned with shifts in shares of population, 
we should be concerned with deviations from the national average, as in our Tables 10 and 11. 
For example, in Florida the 2-20 estimate is 2.63% and for 3-8 it is 1.42%, for a range of 1.21%. 
Subtracting the national rates of 1.9 and 1.0 percent, the respective deviations are 0.73 and 
0.42 percent, for a range of only 0.31%. Use of this statistic weakens the correlation displayed 
by Freedman and Navidi. 


Assumptions 


Freedman and Navidi argue that some of the assumptions underlying our regression model 
- are ‘‘unverified’’ and ‘‘implausible.’’ As we have already argued, both in the EKT paper and 
elsewhere (Ericksen 1986; Kadane 1986) we believe that they are both realistic and based on 
a body of knowledge that has been collected for decades. F and N assert that our model improves 
upon the synthetic estimates only if it uses additional information in a sensible way, bringing 
us right back to assumptions. We believe, despite F and N’s assertions, that our assumptions 
are surely sensible, and indeed more realistic than the assumptions underlying a decision not 
to adjust. 

At the same time, we believe that it is possible to make too much of the role of modeling 
in undercount estimation. For small areas, some type of modeling is surely needed. For the 
66 areas our article was concerned about, the modeling did not usually make a lot of difference. 
For example, if we compare the mean residuals in our Table 11, which average residuals from 
the ‘‘preferred eight’’ estimates, with the corresponding mean residual of the eight sample 
estimates we find the following. For the 50 states, 46 of the residuals are within one percent 
of each other, and 48 are within one and one-half percent. The two remaining states, South 
Carolina and Tennessee, as we explained in EKT, appear to have sample estimates that are 
wrong, and the use of the regression model seems to provide a clear improvement. Turning 
to the 16 cities, five of the differences are in fact greater than two percent. For these, the sample 
sizes were smaller, and the weighted average is much closer to the regression estimate than to 
the original sample estimate. Although F and N would prefer us not to calculate the weighted 
average, we prefer to let the sample data play some role, perhaps small, to account for factors 
not necessarily included in the regression model. Either way, although we hold to our claim 
of their sensibility, we believe that the argument should be focused more on the quality of the 
PEP data than on the assumptions of our estimation model. 


Does It Matter Which PEP Series is Used? 


F and N hold to their position that there is no good reason to choose one PEP series 
over another. On the contrary, while it may be difficult to select a series from among the 
‘““preferred 8,’’ there is good reason not to include Series 10-8 in this group. It is no solution 
simply to drop the movers from the analysis, as was done for Series 10-8, just because the August 
CPS had a problem identifying the April address of movers. As F and N themselves recognize, 
and as we learned from the PEP, movers had higher rates of omission and undercount. The 
problem with Series 10-8 is indicated in two additional ways. First, its national undercount 
rate, 0.3%, is well below the 1.4% estimated by demographic analysis. Second, the between- 
area variability is unrealistically too small, as we show in Table 5. The shift in shares created 
by Series 10-8 is similar to that of Schirm and Preston’s Synthetic B which, while it improved 
over no adjustment, clearly did not go far enough. As a result, the between-area variability 
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among the 10-8 estimates for our 66 areas is too low. For example, assigning equal weights 
to each of the 66 areas as Freedman and Navidi appear to have done, the between-area variance 
for the Series 2-9 estimates is more than twice the corresponding between-area variance for 
the Series 10-8 estimates. Relative to the national average, the Series 10-8 estimates are too 
low in the high undercount areas and too high in the low undercount areas. It is little wonder, 
then, that the residuals from regression are a little bit smaller for Series 10-8 than for 2-9, and 
F and N’s Table 7 has no real meaning. 


Which Explanatory Variables Should Be Used 


Freedman and Navidi believe that when Series 2-9 was the dependent variable in regression, 
we misapplied our own rules to select the independent variables. They argue that we should have 
added the percent living in poverty to our three selections - the minority percentage, the crime 
rate, and the percent conventional. This is because all four predictors in their equation (11) 
have coefficients which are more than twice their standard errors, and this equation has a smaller 
rms residual. They go on to assert that because the coefficient for the poverty variable was 
negative, we rejected the equation that use of our statistical criteria would otherwise obligate 
us to select. In other words, they assert that we let our subjective preconceptions overrule our 
statistical sense. 

The problem with F and N’s criticism is that they did not replicate our selection procedure 
correctly. As we explained in the article, and elsewhere (Ericksen and Kadane 1985; 1987, 
Section 6), ‘‘Our estimate of the undercount rate is a matrix-weighted average of a regression 
estimate and the initial sample estimates (EKT, p. 935).’’ The observations were weighted by 
the inverses of the standard errors of the initial sample estimates. This matters, because some 
states, like South Carolina, had aberrant sample estimates and large variances, and the sample 
sizes for the 16 cities were also smaller, causing the sample estimates to be less precise. Weighting 
the data by this procedure, for example, reduced the proportion of total weights assigned to 
the cities from 24 to 12 percent. When the poverty variable was added to our chosen three in 
a weighted regression, its coefficient was less than twice its standard error, and it was therefore 
excluded. 

Freedman and Navidi also mistake a statistical decision for substantive motivation. On the 
contrary, had the poverty variable, with its negative coefficient, satisfied our statistical criteria, 
it would have added interesting and useful information to our estimates. In general, there are 
two types of areas with high rates of poverty, central cities with substantial minority populations 
and rural areas in states like Kentucky and West Virginia with small minority populations. 
Census errors are more likely to occur in either type of area than elsewhere, but the nature 
of the errors differ. In the cities, omission rates, as Table 6 in EKT demonstrates, were high, 
but in the rural areas, the rates of erroneous enumeration were high. 

The effects of adding the poverty variable can be seen by subtracting F and N’s equation 
(11) from equation (12), providing the following: 


difference in 2-9 fit = 2.23 + .041 min — .010 crime + .001 conv —. 176 pov. 


In areas where the percents minority and living in poverty are both high, or both low, the 
difference may not be great. In areas with many minorities, but perhaps a slightly higher than 
average rate of poverty, the difference may be positive, but in areas with few minorities, but 
a high rate of poverty, the difference is negative. Of the 66 areas, the difference obtained from 
the above equation exceeded one percent only four times, and fell between 0.8 and one percent 
an additional six times. The ten most extreme areas are: 
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Area Equation 12 Equation 11 Difference 
percent 
Maryland R ae bez 1.1 
Houston 33 ay) Sheth 
Washington, DC 8.1 TE 0.9 
Cleveland 4.3 Sal —0.8 
Arkansas —0.3 0.5 —0.8 
Mississippi 1.0 1.8 —0.8 
South Dakota 0.4 1.3 —0.9 
Kentucky —1.3 —0.4 -—0.9 
Saint Louis 5.5 6.6 —1.1 
Boston 3.4 4.9 —1.5 


If we simply apply equation (11) to the 66 areas, with no averaging with the initial sample 
results, the shift in shares is as follows: Group 1, +0.36%; Group 2, +0.20%; Group 3, 
— 0.56%. Substituting equation (12) we get: Group 1, +0.33%; Group 2, + 0.21%; Group 3, 
—0.54%. While the difference between equations (11) and (12) is easily explained, and is 
consistent with our theory of census error, it really makes little difference to the final results. 

Freedman and Navidi also return to the question of whether it was just as reasonable to 
use the percent urban as the crime rate. As we explained in EKT, use of the crime rate produced 
a lower rms residual and smaller standard errors than use of the percent urban. In their Tables 7 
and 8, F and N appear to get different results. The discrepancy is explained by the same mistake 
noted above. By using unweighted data, they did not replicate our regression procedure, hence 
they got different results. Since their strategy gives greater importance to the cities, which had 
smaller sample sizes and therefore more uncertainty, it is not surprising that the percent urban 
becomes more important in their criticism. 

Perhaps Freedman and Navidi think that our decision to weight the data by their estimated 
reliability is yet another arbitrary decision. Weighting seems obviously correct to us and is 
consistent with the strategy the Census Bureau followed in 1990. Where the observation seemed 
to be more reliable, we gave it greater weight. However, because they did not weight the data, 
much of F and N’s analysis is simply different from ours, and their results in this article are 
not pertinent to what we did. This applies to their simulation study as well, both in this paper 
and in Freedman and Navidi (1986). Had they weighted the data, F and N may well have gotten 
different results. Even so, the fact that the variables selected for regression differ is not the 
real issue. The real issue is how much the actual estimates obtained from the different regression 
equations vary. The answer to that, as we have shown above, is that the undercounts do not 
differ substantially. 


Final Comments 


Perhaps the main point of the EKT paper is that within the range of reasonable PEP series, 
for any set of predictor variables that are well correlated with the undercount, results of under- 
count estimation are similar. In the end, the resulting undercount estimates are rather insensitive 
to changes either in the predictor variables or the choice of a PEP series. By a similar token, 
we do not give much weight to F and N’s simulation results. The fact that different simulations 
adding random errors find different ‘‘best sets’’ of predictor variables does not tell us much, 
unless the distribution of the undercount turns out differently, which it does not. 
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In the absence of direct evaluation data, we carried out our sensitivity analysis, to see what 
the effects of various assumptions had on the estimates. In our view, substituting reasonable 
alternatives for the PEP series and undercount predictors made little difference. Moreover, 
the results followed a very reasonable pattern in light of the well-documented history of census- 
taking problems. In those areas where the Census Bureau had greater problems taking the 
census, the rates of omission, erroneous enumeration, and undercount were higher. In the end, 
we believe that the substantial and largely unchallenged evidence of serious census-taking errors 
combined with the consistency of estimates across choices of independent and dependent 
variables, and the agreement of the pattern of undercount with results of demographic analysis, 
provides ample reason to adjust. 

Freedman and Navidi hold the adjustment data to a higher standard than unadjusted data. 
They take on faith, and contrary to decades of Census Bureau evidence, that the unadjusted 
data are accurate, and they do not seem to be concerned with an evident pattern of bias across 
areas. At the same time, and in the absence of any direct evidence, they assume large biases 
in the PEP data, when the Census Bureau studies do not demonstrate the existence of such 
biases (U.S. Bureau of the Census 1988, Section 6F) In other words, they do not seem to place 
the unadjusted and adjusted data at the same starting point when making their analysis. In 
doing so, F and N are able to throw out ‘‘possible problems”’ as if they were real ones and 
to neglect real problems with the unadjusted census as if they did not exist. They reject 
adjustment on this basis alone. 
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RESPONSE FROM THE AUTHORS 


1. Introduction 


After some general remarks, we respond to each of the discussants’ main points. There is 
some overlap among their arguments; we try to deal with each point only once. Like the other 
participants, we have learned something over the years - and from the present exchange - but 
have not changed our opinions on the central questions. One issue cannot be in dispute: Editor 
M.P. Singh deserves thanks from all sides. 


2. A brief Outline of Adjustment 


There is a proposal to adjust the census using capture-recapture techniques. A person is 
‘‘captured’”’ if they are counted in the census; ‘‘recapture’’ is in a special sample survey done 
after the census. In 1980, this survey was called PEP, or Post Enumeration Program. In 1990, 
the terminology shifted to PES, or Post Enumeration Survey. 

These surveys measure the rate at which people are missed from the census (‘‘gross 
omissions’’), as well as the rate at which people are counted in error (‘‘erroneous enumera- 
tions’’). Erroneous enumerations include babies born just after census day, people counted 
at the wrong address, efc. To a first approximation, the net undercount is estimated as the 
difference: 


gross omissions — erroneous enumerations. 


There is a significant additional complication. In 1980, sampling error was a large-enough 
problem (according to many observers) so that estimates from the survey could not be used 
directly. Instead, in EKT’s terminology, ‘‘sample estimates’’ from the PEP had to be run 
through a smoothing model to get ‘‘composite estimates.’’ In 1990, the terminology is different: 
‘‘raw adjustment factors’’ from the PES are modeled to get ‘‘smoothed adjustment factors.’ 
But the problem of sampling error is even more salient. For more details, see Freedman (1991), 
U.S. Department of Commerce (1991a, pp. 4.2-4.18), or Wolter (1991). 


3. The Census is Bad so the Alternatives must be Better 


Many discussants make an argument which, baldly summarized, comes down to this: the 
census is bad; the PES must be better; therefore, we should adjust. This is a confusion: it treats 
the census and the PES as alternatives. However, you cannot choose the survey instead of the 
census; at most, you can try to use the PES to correct flaws in the census. The question, then, 
is not whether the survey is better, but whether it is good enough for its intended use. 


The Secretary of Commerce framed the issue as follows: 


“*T concede the census’ imperfections, but the critical inquiry . . . is not how flawed the 
census is, but whether the PES can fix it .... [W]hile identifying flaws in the census 
is important for planning the next one, it simply begs the question . . .. Is there convinc- 
ing evidence showing that the adjustment is more accurate than the enumeration? 
[U.S. Department of Commerce 1991a, p. 2.13].’’ 
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Fienberg 
Fienberg defines the central issue as follows: 


‘*At issue is both the accuracy of the census and the adjustment process. And, it is the 
substantial differential undercount, i.e., the difference between the undercount for 
Blacks and the undercount for non-Blacks and between Hispanic and non-Hispanic, 
that is important when we come to assess census accuracy. This is because census figures 
are typically used to divide resources among groups in the population, resources such 
as seats in the U.S. House of Representatives; seats in state legislatures; federal funds; 
and so on. [p. 25, emphasis omitted] .’’ 


We think this is misleading. The argument is about shares: more specifically, the accuracy 


of shares computed from adjusted figures and from the census. But the shares that matter are 
for geographical areas - states, cities, counties, and so forth. The total share of blacks or 


his 


panics in the U.S. population, at the national level, matters much less. Seats in Congress 


are allocated to states, and within states to geographical areas. They are not distributed to 
national racial or ethnic groups. Similarly, tax moneys go to some 39,600 local governments, 
defined by area not race or ethnicity. The crucial issue is whether adjustment improves the 
accuracy of population shares for geographical areas rather than groups. 


(i) 


(ii) 


Fienberg is misleading at other points as well. We give two examples. 


Fienberg (p. 27). ‘‘ [Freedman and Navidi] focus on the variation amongst the full set of 
12 alternatives, some of which to me are implausible given the assumptions that they rely 
upon.’’ But we did study variation among EKT’s preferred series rather than the full set: 
see pp. 7-8 and 13-14. We did this not because we agreed with EKT’s choices, but to make 
irrelevant Fienberg’s kind of argument. That didn’t stop him. 


Fienberg (p. 27). ‘‘I read the report by Ylvisaker (1991) who reexamined data from a trial 
census in Los Angeles in preparation for 1990, but I could not find the evidence that 
Freedman and Navidi state is supportive of their claim that smoothing increases 
variability.’’ Ylvisaker did a bootstrap experiment using data from Los Angeles, where 
there was a test census and a test post enumeration survey in 1986. At the tract level, 
bootstrap SEs for the smoothed estimates are generally larger than the SEs for the raw 
estimates. (See Ylvisaker’s Table 3; smoothing reduced the SEs in 19/61 tracts, increased 
the SEs in 26/61, and the remaining 16/61 were ties; at the block level the effects go the 
other way but are small in either case.) For the whole site, the comparison is as follows 
(Ylvisaker p. 7): 


SE for smoothed estimate = 0.75. 
SE for raw estimate = 0.68. 


As we Said (p. 17), ‘‘smoothing may actually increase sampling error.”’ 


Nothing is Perfect, and don’t Let the Best be the Enemy of the Good 


Fienberg says (p. 27), 


‘‘A familiar theme in various writings by one of the present authors is the problems 
that arise when assumptions are not satisfied. Here again the authors pursue this theme 
with respect to the linear equation used for smoothing. They appear to argue that either 
all assumptions are perfectly justified or ‘all bets are off.’ Nothing could be further from 
the truth.’’ 
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Alas, our position is more complicated than that. We think the census is imperfect, but good. 
We think the smoothing models are quite questionable, and the arguments to defend them are 
bad. Proponents of adjustment have an obligation to state their assumptions and produce data 
to validate them. The models don’t have to hold perfectly, but departures from assumptions and 
their impacts need to be studied. Otherwise, the algorithms have no justification except familiarity. 


5. The Burden of Proof 


As the exchange with Fienberg indicates, modelers are reluctant to accept the burden of 
proof. Once they make an assumption, it is taken as truth unless it can be disproved. Even 
then, they may view the assumption as useful until it can be replaced by some other assumption. 

Language is used in a specialized way. An assumption is ‘‘reasonable’’ if the modelers think 

-it is reasonable. If questioned, they introspect again. The introspection confirms the original 
conclusion; after all, the assumptions are by now familiar parts of the technical literature. The 
modelers become indignant at those who do not share the faith. If all ‘‘reasonable’’ options 
favor adjustment, arguments on the other side must be ‘‘unreasonable.’’ 

So far as they are reported, the modelers’ thought experiments do not seem especially 
rigorous; and the pro-adjustment argument can be peculiarly non-empirical. Illustrations 
follow. One axiom in the Ericksen-Kadane smoothing model is independence. See equations 
(1-6) in our paper. Independence drives the variance calculations, because small correlations 
can have a big cumulative impact. Variances determine whether smoothing is a help or hin- 
drance. The independence assumption matters. 


As far as we can see, the adjusters’ main arguments for independence are the following: 


(i) The errors are not perfectly correlated. (We adapt to present context an argument by 
Madansky 1986, p. 29.). 


(ii) (a) ‘‘The 1980 census was administered by more than 400 district offices, an average of 
eight per state. (b) To our knowledge no one has suggested that there actually was an April 
snowstorm or any other event that affected the census in neighboring states. (c) When 
we correlated PEP estimates for cities with the corresponding estimates in their states 
(e.g., Detroit with the remainder of Michigan), we found no evidence of a correlation.”’ 
[EKT, p. 931; we responded to (b) by noting the eruption of Mt. St. Helens. | 


(ili) ‘‘Surely they don’t expect anyone to believe the argument that the eruption of 
Mt. St. Helens interfered with census taking in a serious way ....’’ [Fienberg p. 27. ] 


In fact, what we expected from the modelers was serious argument about the validity of 
assumptions, rather than intuitions about possible sources of dependence like snowstorms and 
volcanoes. Over time, the force of that expectation has dwindled. Real empirical evidence is 
hard to get, on both sides. Their mainstay is the rhetoric: Nothing is perfect, so anything goes. 
That is the adjusters’ standard for the models. On the other hand, the census is required to 
be right to within a few percentage points - where ‘‘right’’ is defined by the models. 


6. Fellegi 


We agree with many of Fellegi’s points. In the U.S., for instance, small-area income data 
help determine funding allocations. These data have weaknesses of their own, not addressed 
by census adjustment. Likewise, there are substantial shifts in the population between censuses. 
Better income data, or a mid-decade census, might be more useful than any adjustments to 
the decennial census. 
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There is one point we would like Fellegi to consider. A decision to adjust the census, whether 
in the U.S. or in Canada, has major organizational costs: it encourages the replacement of 
data collection by modeling. 


‘*In sum, real data (with real flaws) would be replaced by complicated and poorly tested 
mathematical models of data. We do not see that as progress.’’ [Beran ef al. 1988. ] 


7. The PEP Series 


In 1980, there was a substantial amount of missing data in the surveys used to assess census 
error. Different ways of filling in the missing data lead to different estimated undercount rates. 
In the end, the Bureau had a dozen different PEP series: each provides estimated undercount 
rates for 66 geographic areas (central cities, states apart from their central cities, whole states). 
A series is identified by a pair of numbers, e.g., PEP 2-9 or PEP 10-8. For more details, and 
arguments about the merits of the various series, see FN p. 4, EKT p. 929, Fay etal. 1988, p. 63. 


8. Cressie 


Cressie agrees (p. 32) that in 1980, ‘‘data and methods were inadequate for an accurate 
adjustment of the whole country.’’ Of course, many of the arguments are relevant to the 1990 
decision, and on those, Cressie’s opinion may differ from ours. This is not the place for an 
extended discussion of 1990, but we can respond to some of his points, at least in outline. 


Demographic Analysis 


Cressie - like other discussants - relies on estimates from demographic analysis, a technique 
that uses administrative records (birth certificates, death certificates, etc.) to make an indepen- 
dent estimate of the total population. For details, see Fay et al. (1988). 

How good is demographic analysis? It may be surprising to some, but government statistical 
agencies keep changing their minds about the past. The estimated GNP for a year in the past 
- 1985, for example - depends on the year in which the estimate is made. The numbers keep 
on changing, and the revisions give some clues about the reliability of the initial data. 

Table A gives a brief history of revisions to demographic analysis for the 1980 census. As will 
be seen, the numbers are far from stable. The difference between estimates made in 1984 and 
1988 may reflect new understanding about the role of illegal immigration. The change from 
1988 to 1991 may reflect the impact of adjustments to earlier adjustments intended to correct 
for under-registration of births in the period 1935-1960. Apparently, these were over-adjustments, 
which may now have been fixed. 


Table A 


A short history of revisions to demographic analysis of the 1980 census: 
Estimated undercounts by date of estimate 


1984 1988 1991 
All races OD 14 eZ 
Blacks 5) 3) 5.9 4.5 
Non-Blacks —0.2 0.7 0.8 
Differential x75 Diy! Shell 


Source: Col. 1. Cressie, citing Passel and Robinson (1984); the figure for all races is derived. 
Col. 2. Fay et al. (1988, p. 95, series DA-2). 
Col. 3. U.S. Department of Commerce (1991c, Table 3). 
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Demographers can use data from administrative records to estimate the population of the 
U.S., and they seem to get it right to within a percentage point or two - a remarkable achieve- 
ment. However, it seems unlikely that the errors are much less than a percentage point. If so, 
demographic analysis may not be reliable enough for adjusting the census. 


Modeling 
Cressie says (p. 33), 


‘*The important concept to maintain is that true undercount in regions is unknown and the 
ignorance is quantified into a probability model. The goal is not estimation of the coef- 
ficients @ but prediction of the undercount. With an error term that does not have to be 
independent and identically distributed, this prediction is insensitive to misspecifica- 
tion ... [emphasis omitted].”’ 


We disagree. A model for one investigator’s ignorance is no basis for public policy. And results 
must depend strongly on specifications. To illustrate, we note some of the assumptions in the 
model developed by Cressie (1988). Equations (2.7) and (2.10) in that paper effectively rule out 
nonsampling error in the PES, as well as systematic variation in undercount rates across geo- 
graphical areas; and no correlations appear. Why? Equation (2.10) specifies a sampling variance 
which avoids the internal inconsistencies in the Ericksen-Kadane model (Cressie 1988, p. 193). 
However, logical consistency does not imply empirical truth. Where does the real sampling 
design come in? Finally, why should we use Cressie’s loss function (2.15)? Until Cressie answers 
these questions, and others like them, his model outputs have no claim to be taken seriously. 

When Cressie gets down to cases, he is computing estimated risks (expected losses). See his 
equations (2.28-2.31). That means he has to compute variances. Variances are extremely 
sensitive to assumptions, as Cressie knows: 


Needless to say, these results rely on the correctness of the assumed model. [p. 193]. 


An elementary illustration may help. Suppose €;, ..., €66 are exchangeable, with mean 0, 
variance o”, and pairwise correlation p. Now 


66 
( ) = 2145. 
Zz 


Vat CG packs once che G65) = (66 tx2145p)0°. 


Therefore, 


In this game, a correlation of, say, 0.05 makes a huge difference. And correlations that small 
would be quite hard to detect empirically. Cressie doesn’t try. (With 16 data points, even a 
correlation of 0.5 might be hard to estimate, so EKT’s test #3 on p. 931 cannot have much 
power.) 

The example may seem artificial. However, sampling error was a major obstacle to adjusting 
the 1990 census on the basis of the PES, even at the state level. Indeed, published data show 
that for a clear majority of states, the population shares from adjustment would be within two 
standard errors of the census shares (U.S. Department of Commerce 1991b). Such adjustments 
could result entirely from sampling error in the PES. (‘‘Loss function analysis’’ might be the 
adjusters’ response, and we discuss that briefly when answering Hartigan.) 

The standard errors, like the estimated adjustments, are outputs from a smoothing model 
akin to the EK model. Bootstrap experiments reported in Fay (1992) show that these standard 
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errors are too small by a factor of 2 or so. (Fay gives the range 1.4 to 2.2, with a preferred 
multiplier of 1.7.) When it comes to computing variances, assumptions make all the difference. 


PEP 3-8 


On p. 32, Cressie agrees that the 1980 adjustment data were not strong enough to use. By 
p. 33, he wants to adjust using his model and PEP 3-8 (see Section 7 above). He seems to have 
assumed away all the problems created by non-sampling error, missing correlations, and so 
forth. If so, his calculations are unrelated to the policy questions. 


The Quality of the PES 


Cressie says (p. 33) that the PES was ‘‘well designed, well implemented, and quality 
assured.’’ So it was, relative to a typical market research survey, or perhaps even relative to 
other Census Bureau surveys. However, to fix a small error in the census, you need a sample 
survey which makes much smaller errors. And we do not believe the PES meets that standard. 
For example, the PES estimated a national undercount rate of 2.1%. Between 1/3 and 2/3 
of that 2.1% can be attributed to non-sampling error in the PES. See Mulry (1991, Table 15) 
and Bryant (1992). The PES seems to be fatally flawed. We return to this topic in answering 
Hartigan, below. 


Conclusion 
Cressie’s main point seems to be this (p. 33): 


**To solve a problem as hard as adjustment for undercount, the common goal needs to 
be recognized. From there, debate should center around differences on how that goal 
might be reached. If Freedman and Navidi’s position is that the goal is impossible to reach 
(which is what they seem to have implied over the years), then it should be stated.’’ 


Let us be clear. In our opinion, PEP could not solve the problem in 1980, and the PES cannot 
solve the problem in 1990. Nor are we optimistic about the year 2000, whatever acronym may 
be in use then. If you can’t count them, you shouldn’t make them up afterwards by running 
capture-recapture data through smoothing models. 


9. Passel (1987) 


Many of the discussants defend synthetic adjustment, some very strongly. Few of them are 
much taken with our counter-example (Table 3). However, Passel (1987) used 1980 census data 
to show that synthetic adjustment was unlikely to improve accuracy. His work was summarized 
in the Appendix to our paper. No discussant responds to his argument. 


10. Schirm and Preston 
The Counter-example 
SP (1987, p. 966) make a claim about synthetic adjustment: 


‘Our finding is that synthetic adjustment will always move the estimated ratio of a state’s 
population to the national population closer to the true ratio if (a) the state’s black under- 
count is closer to the national black undercount than it is to the national undercount for 
both races combined and (b) the state’s white undercount is closer to the national white 
undercount than it is to the national undercount for both races combined.’’ 
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Our counter-example (Table 3) showed this result to be wrong. They should concede the 
point. 

Under some conditions, and by some criteria, synthetic adjustment is doubtless a good thing 
to do; see, e.g., (15) in our paper. The result in SP’s (1987) appendix is correct but not 
illuminating: the inequality they assume in equation (A.2) on p. 976 is exactly the inequality 
on absolute error they seek to prove, up to multiplication by a scale factor. 


The Simulations 


S and P prefer a strict definition of the synthetic assumption - ‘‘there is no variation at all’’ 
in undercount rates within race across geography. They say (p. 37) that they did not construct 
true populations on the basis of the synthetic assumption, and their definition of truth did not 
favor synthetic adjustment. 

We adopt their terminology for a moment. They constructed the true populations from the 
synthetic assumption plus random error. Indeed, the simulations hold the census counts fixed, 
and randomize the true population. The true population of racial group / in state 7 is assumed 
to equal the corresponding census count, multiplied by a random adjustment factor u;;. See 
SP (1987) equation (1) on p. 967. This adjustment factor is drawn at random from a distribution 
which depends - by assumption - on the racial group but not the state. See SP (1987) 
equation (2) on p. 967. 

The simulations assume away systematic variation in undercount rates within race across 
geography. On the other hand, synthetic adjustment assumes that the structure of undercounts 
is determined by race not geography. That was our point on p. 10, and it is right. 


Indeed, S and P concede (p. 37). 


‘“‘We considered cases of extreme, albeit nonsystematic, interstate variation in under- 
counts by race ....”’ 


The ‘‘albeit nonsystematic’’ is their concession; the ‘‘extreme’’ must be the defense. 


The a fortiori Argument 


S and P say their simulations were conservative; the real pattern of variation in undercount 
rates across areas would favor synthetic adjustment even more strongly than the assumptions 
they made. (See e.g. p. 38). SP (1987) had a priori arguments to that effect. Passel (1987) shows, 
among other things, that such arguments do not prove much about 1980; see the Appendix 
to our paper. SP (1987, p. 977) make some empirical arguments, using data that are “‘seriously 
flawed, based on heroic assumptions’’; S and P’s language (p. 38), not ours. Further discussion 
seems unnecessary. 

On p. 38, S and P introduce new analysis based on PEP to justify the parameters in the 
simulations. In present context, that is quite a move: EKT want us to believe the PEP series 
because they are like the synthetics, while S and P want us to believe the synthetics because 
simulations are like PEP. 

Before we accept either, we want some evidence. Circular reasoning is not persuasive. 


11. Hartigan 


Synthetic Adjustment 


Hartigan rejects Schirm and Preston, but argues strongly in favor of synthetic adjustment 
(p. 45). He says, 
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‘“‘What about the following analytic argument? Suppose the national undercounts are 
correctly estimated, but the undercounts differ over states . . .. National undercount rates 
of 5% and 1% are supported by historical data from the Bureau, both by demographic 
analysis and post enumeration surveys ... I will ignore variations in the non-minority 
undercount between States ....”’ 


Unless we are much mistaken, these analytic arguments are too far from the facts to be rele- 
vant. Hartigan’s basic assumption is that the national undercount rates are known. That 
assumption is wrong: we doubt that the rates can be reliably estimated, either by demographic 
analysis or the PES, to within a factor of 2. See our discussion of Cressie, above. Furthermore, 
Hartigan ignores variations in non-minority undercount rates across states. Such variation has 
to matter: for example, a 1% undercount among 9 million people in a state has almost twice 
the impact of a 5% undercount among | million. 


Modeling 
Hartigan says 


‘*T would suspect that the assumptions of the regression can not be easily defended, but 
that the results of the regression are reasonable, except perhaps in producing lower standard 
errors that are justified by the probable lack of independence .... Reduction in sampling 
variance by regression-based smoothing procedures is not likely to make much difference 
to estimates in large localities such as States .... A healthy skepticism about any resulting 
‘standard errors’ or ‘confidence intervals’ is justified. [p. 48 emphasis omitted ].’’ 


For 1980, the choice of variables makes a lot of difference to the adjustments for small areas. 
See FN p. 9. For 1990, the ‘‘raw’’ adjustment factors (computed directly from the sample 
without regression) have such large sampling errors as to be unusable, even at the state level. So 
the adjusters need to smooth. But the choice of smoothing models makes quite a difference to 
the results. See the Secretary’s Decision (U.S. Department of Commerce 1991a, pp. 2.46-2.55) 
and consider the numbers in the Press Release (U.S. Department of Commerce 1991b). 

Furthermore, the argument for adjusting rides on a ‘‘loss function analysis,’’ which uses 
variances computed from the smoothing model to make unbiased estimates of risk. The model 
is known to be too optimistic about its variances, perhaps by a factor of 5; see FN p. 10, 
Ylvisaker (1991), our main paper Section 7.3, and Fay (1992). If ‘‘healthy skepticism”’ is applied 
to the loss functions, we see no arguments left on the table for the efficacy of proposed 
adjustments. 

We expect to discuss the Bureau’s loss function analysis in another paper. Hartigan does 
his own calculations on p. 49; again, they are too far removed from the data to carry much 
weight. In any event, readers can look at the Bureau’s analysis (Mulry 1991; Woltman ef al. 
1991) before buying any conclusions. 


The Third Pope 


“*The bureau has produced a number of estimated undercounts, with margins of error, 


in the various states. I use the ‘selected PES method’ (called PES from now on) .... Now 
there are two popes, the enumeration and the PES figures. Which is correct? Well, you 
need a third pope, an infallible one .... [p. 49].’’ 


Hartigan is on to something important here. The Bureau’s ‘‘third pope’’ consists of the loss 
function analysis discussed above, and a ‘‘total error model’’ (Mulry 1991). These seem highly 
fallible: the loss function analysis because it depends on variances computed from the smoothing 
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model, and the total error model because it depends on results from the Evaluation Followup 
to measure non-sampling error. (Furthermore, the two models interact in crucial ways, but 
that is a topic for another day.) 

The adjusters are trying to fix an undercount of maybe 2%. To do that, they need to control 
non-sampling error in the PES to well below 1%. They say they did it, on the basis of data 
from yet another sample survey - the Evaluation Followup. If they are measuring non-sampling 
errors in the PES to within a fraction of 1%, the errors in the Evaluation Followup have got 
to be an order of magnitude smaller. They must be kidding. 


The Five Questions 


Hartigan concludes with five questions, and we will answer two. (The first is edited slightly, 
for clarity.) 

(i) ‘‘Do Freedman and Navidi agree with these estimates of 17 million omissions and 
13 million erroneous enumerations?’’ We accept the numbers as rough estimates, subject to 
large and unknown biases as well as large and unknown standard errors. The difference of 
17 — 13 = 4million may be off by a factor of 2 or more. Estimating a small number by taking 
the difference of two large numbers is a time-honored recipe for trouble. 

Furthermore, a crucial issue is where to put the 4 + 2 million people. Fienberg doesn’t like 
the foothills of South Dakota. That narrows the options to 6.5 million blocks spread over 39,000 
jurisdictions. The PES gave us data on 0.2 of 1% of the blocks, and perhaps 10% of the jurisdic- 
tions. Great theater compels the audience to suspend disbelief. Adjustment does not reach that 
level. 

(ii) ‘‘If the PES is not good enough, how should the follow-up survey be designed so that 
it could be used to adjust the census?’’The answer is a question of our own: What on earth 
makes him think it can be done at all? 


12. Speed 


Adjustment depends on models and assumptions for which there is no empirical proof. That 
is Speed’s message, and we agree. 

To adjust the 1990 census, the population is divided into 1,392 “‘post strata,’’ or demo- 
graphic groups. One example is post stratum 90302112, male hispanic renters age 10-19 in cities 
in the Pacific Division. The adjustment depends on the ‘‘homogeneity assumption,”’ 
that undercount rates are more or less constant with each post stratum across geographical 
areas. See Freedman (1991) or (U.S. Department of Commerce 1991a, pp. 2.37-2.45, 
pp. 4.16-4.18). 

This assumption is hardly an obvious truth. The Bureau did some work to test it 
(Kim 1991). However, that study seems to have been quite poorly designed, and in any case 
gives rather mixed results. The theory of adjustment is particularly shaky when it comes to 
small areas. 


13. Ericksen and Kadane 


The Role of Assumptions 


We say that ‘“‘success of any of EKT’s proposed adjustments rides on unverified and 
implausible assumptions.’’ EK answer (p. 52) that their ‘‘assumptions are realistic and verified 
by decades of census-taking knowledge, as we will argue below.”’ 
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The argument they have in mind seems to be on p. 55: 


‘“‘Freedman and Navidi argue that some of the assumptions underlying our 
regression model are ‘unverified’ and ‘implausible.’ As we have already argued, 
both in the EKT paper and elsewhere (Ericksen 1986; Kadane 1986) we believe that 
they are both realistic and based on a body of knowledge that has been collected 
for decades.”’ 


We reviewed the EKT paper, as well as (Ericksen 1986) and (Kadane 1986). We found no 
empirical evidence to substantiate the assumptions, or to quantify failures (e.g., to determine 
the real sizes of the correlations assumed to be 0), or to determine the impact of failures on 
model output. Instead, EK rely on arguments from convenience (a good model is ‘‘simple and 
tractable’ and ‘‘permits smoothing,’’ Kadane 1986 p. 13). They also have their own variation 
on nothing-is-perfect rhetoric: 
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....1n applications, only a very naive user would believe in the literal truth of the 
assumptions. Thus in my view, when I state and use an assumption, I mean that I 
think something like this is true, but surely I do not mean that exactly this is true .... 
(Kadane 1986, p. 14).”’ 


What makes EK think that ‘‘something like’’ their model is true? Convenience and nothing- 
is-perfect, even in combination, do not validate assumptions or quantify the impact of failures. 

Opening another front, EK tax us with having our own unverified assumptions. Our guilt 
on this score would hardly imply their innocence; but we deny the charges, or at least most 
of them. Three examples give the flavor of our ‘‘unverified assumptions”’ (p. 53). 


(i) A small undercount is thought to remain in the census. 


(ii) Minority persons living in central cities are likely to behave differently from those in 
suburbs. 


(iii) The undercount in conventional areas was relatively high. 


Point (i) still seems to be right. If EK will concede that it is wrong, we can all save a lot of 
courtroom time and journal pages. Point (ii) is obvious to anyone who has spent a few days 
in a big city in the U.S., but if data are needed, see Freedman ef a/. (1991), which also reviews 
some literature on this topic. On point (iii), we should have said ‘‘estimated undercount.”’ 
Touché. 


Evidence for Assumptions 
One example is enough. EK say (p. 52) there is 


“*evidence of greater census-taking problems in some areas than others, and ... PEP- 
measured omission rates and undercounts are higher in those areas with lower mailback 
rates, higher rates of missing data, and greater problems maintaining the specified long- 
form sampling rate on the census.’’ 


(For a brief review of the PEP series, see Sections 2 and 7 above.) At most, EK are proving 
that the PEP data have some relationship to undercount rates, and that we never denied. 
However, not all relationships can be summarized in a regression model. To get the model going, 
EK (p. 54) say only ‘‘there was nothing about the sources of information that made combining 
them inconsistent or unusual.’’ This is astonishingly weak, because the tests are only the 
following: (i) the model should have no internal contradictions; (ii) somebody else should 
already have done something similar. 
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Adjusting Small Areas 


Earlier, EKT seemed to concede that they could not adjust small areas (p. 943, also see 
Section 4 of our main article). EK now withdraw the concession (p. 54), citing work by Tukey 
and Wolter and Causey. That work was reviewed in the Appendix to our paper. We do not 
find it convincing, and explained why. EK do not respond to our arguments. 


EKT?’s Table 5 


EK say that our argument ‘‘goes too far.’’ However, their Table 5 is supposed to show that 
their preferred PEP series are in general agreement with synthetic estimates. Such agreement 
would demonstrate the value of PEP only if synthetic estimates were known to be accurate. 

That premise is doubtful, as discussed before. 

. Furthermore, on the scale EKT chose, we found remarkable disagreement among their 
preferred PEP series. EK’s response: they previously restricted attention to 8 of the 12 PEP 
series, but now want to eliminate two more (from August). That is not good: among other 
reasons, the extreme difference noted in our equation (8) occurs with April series making the 
final cut. Next, EK average across their most preferred series. Averaging results from a 
sensitivity analysis to reduce variation is a peculiar idea, as discussed in Section 5 of our paper. 
We return to the point, below. 


EK go on to say (pp. 53-54): 


‘‘More importantly, all the 14 adjustments [the 12 PEP series and the two synthetic 
adjustments] improve upon the census by shifting population shares from areas where 
census-taking problems were low to areas where they were high.’’ 


This is euphemistic. As Table 5 in EKT makes clear (and see EKT p. 927, EK p. 53), the areas 
where census-taking problems were high are the areas with a high concentration of minority 
persons. However, as we explained in responding to Fienberg, legislative seats and tax moneys 
are allocated to geographical areas, not racial or ethnic groups. The key issue is whether adjust- 
ment would improve the accuracy of population shares for small geographical areas - states, 
cities, counties. EKT’s Table 5 is about broad groupings of cities and states. Such aggregates 
seem artificial. 


Which PEP Series? 


EK try yet one more time (p. 54) to justify their preference for 8 out of the 12 contending 
PEP series; they particularly seek to eliminate our dreaded foil, series 10-8. The main argument 
is concordance with demographic analysis at the national level. EK also claim that ‘‘the results 
in areas in which Blacks are concentrated are consistent with the demographic results.’’ This 
must be a slip in the prose, since demographic analysis gives no results below the national level. 

EK indicate (p. 54) that concordance matters, because demographic analysis ‘‘gives a reliable 
estimate of the national undercount.’ Demographic analysis probably is more reliable than 
PEP. But it has real problems of its own: see our discussion of Cressie, above. Concordance 
is a weak argument. 

Furthermore, any agreement between PEP and demographic analysis at the aggregate level 
masks substantial differences in detail, as Jeff Passel showed in court. The arguments have 
been reviewed before, but we try again. PEP 2-9 is the most preferred of EKT’s preferred PEP 
series; Table B compares PEP 2-9 to demographic analysis. PEP 2-9 is a bit low on black males, 
100% too high on black females, 33% too low on white males, and too high on white females 
(0.5 of 1% vs. 0). The agreement has evaporated. 
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Table B 


Comparing two ways of estimating undercount rates in the 1980 census: 
Demographic analysis (DA) and PEP 2-9. 


DA PEP 2-9 
Black Males 8.8 8.1 
Black Females By 6.4 
White Males 5 1.0 


White Females 0.0 0.5 


Source: Fay ef al/., 1988, Appendix D. 
Note: Demographic analysis is based on the series DA-2; ‘‘white’’ includes ‘‘other races.’ 


Moreover, EK’s defense is not totally consistent. For instance, compare p. 54 with p. 55. 
On p. 54, national undercount rates are important, on p. 55, variation in national undercount 
rates is unimportant. And their position of the moment - averaged across pages — is inconsistent 
with that taken by the National Academy of Science Panel, where prominent participants were 
Steve Fienberg and Jay Kadane: 


‘*There are a number of reasons, both a priori and a posteriori, supporting the various 
individual [PEP series] from this list of 12 .... For example, estimate 10-8 reduces the 
problem for movers when using the August P-sample .... These points among others 
are detailed in Bailar[’s affidavit in Cuomo v. Baldrige].’’ 


‘*The use of these 12 estimates produced very different estimates of undercoverage for 
national demographic groups .... Some analysts have suggested that the number of 
acceptable estimates should be narrowed considerably. For example, Ericksen ... would 
discard all but the 2-8, 2-9, 3-8, and 3-9 estimates as either based on August data, which 
had a higher rate of cases with unresolved match status, or as making use of extreme 
assumptions in the adjustments for missing data. However, even within this restricted 
set, the national undercount rate ranges from 0.8 to 1.4 percent. [Cohen and Citro 1985, 
pp. 147-148.]”’ 


In short, even among EK’s most preferred series, different imputation models give different 
results. Nor is there good reason to discriminate against our foil 10-8. 

Likewise, EK say (p. 55) that ‘‘Schirm and Preston’s Synthetic B ..., while it improved 
over no adjustment, clearly did not go far enough.’’ This contradicts previous positions taken 
by the panel, albeit tentatively; see (Cohen and Citro 1985, p. 287; our paper, p. 8). 


Averaging and Sensitivity Analysis 


EK invite us (p. 54) to replace the various PEP series by the average, and to consider rms 
deviations from average. However, the point of the 12 different imputation schemes was to 
measure the impact of modeling. For that purpose, the range is the right statistic: two randomly 
selected imputation models may give similar results, yet a third may be quite different. In the 
end, EK want to do a sensitivity analysis, but downplay any model that is different from the 
other ones. 

EK propose again to subtract each series’ estimated national undercount rate from its 
estimates for the 66 study areas. EK are tacitly assuming - with no basis — that one imputation 
model holds for the whole country. Our analysis takes the view that data may be missing for 
different reasons in different parts of the country (Section 7.1). 
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EK give new reasons (p. 55) for rejecting PEP 10-8. The argument comes down to this: if 
their preferred series are right, our foil 10-8 is wrong. Just so. Conversely, if 10-8 is right, their 
series are wrong. In other words, it matters which PEP series is used. When you are estimating 
small undercount rates, 8 percentage points of missing data make a difference. No statistical 
manipulation can change that awkward fact. 


Replication 
EK say, 


‘The problem with FN’s criticism is that they did not replicate our selection procedure 
correctly. As we explained in the article, and elsewhere (Ericksen and Kadane 1985; 1987, 
Section 6), 


(i) The observations were weighted by the inverses of the standard errors of the initial 
sample estimates. 


(ii) ‘Our estimate of the undercount rate is a matrix weighted average of a regression 
estimate and the initial sample estimates.’ [p. 56, order of points interchanged from 
original. ]’”’ 


Regrettably, EK are confounding two issues: (i) how you select variables, and (ii) what you 
do after selecting them. With respect to point (ii), EK are using the Lindley-Smith hierarchical 
Bayesian regression model. There is only one wrinkle: the parameter o” is unknown and must 
be estimated; see FN pp. 5 and 11. 

Once the variables are selected and o” is estimated, there is no ambiguity about EK’s 
estimator. See Ericksen-Kadane (1985) equation (3), FN equation (9), or the notes to Table 8 
in our main paper. Indeed, we were able to replicate their numbers in court; the judge even 
complimented us on the accuracy of the Berkeley computers. We illustrate the point again, 
with data in EKT. Their composite estimator based on PEP 2-9 can be extracted from Table 10. 
The lead example is St Louis, and their value for the estimator is 


1.24 + 0.66 + 4.16 + 1.09 = 7.13. 


Our value is 7.12. (The largest discrepancy we found was for Dallas: 6.22 vs. 6.18.) Given the 
variables, we can do the rest. 


Most of the discussion in FN, and in the present paper, depends on what you do after selec- 
ting the variables, and is immune from EK’s criticism of incorrect replication. In particular, 
despite EK having singled it out by number, our Table 8 is fine. It has nothing to do with the 
algorithm for variable selection, and we stand by it. 

The situation is otherwise with our equations (11-12), Table 7, and Tables 9-10, corresponding 
to Tables 5 and 6 in FN. Those calculations really do depend on the variable selection algorithm, 
and we discuss the implications after a few remarks to provide context. 

In 1986, EK criticized our simulations, but not on present grounds: the simulations started 
from the infamous series 10-8. The issue of OLS vs. GLS was not mentioned. In 1989, EKT 
criticized the simulations again, for yet another set of reasons: (i) we restricted attention to 
models with 3 variables, and (ii) we did not require the coefficients to be significant. 

They raise the issue of OLS vs. GLS now, for the first time. In response, we redo our calcula- 
tions once more, using GLS with observations ‘‘weighted by the inverses of the standard errors 
of the initial sample estimates’’; coefficients must be significant, but negative values are 
permitted. We report first on equations (11) and (12); ¢-statistics are shown in parentheses. 


72 Freedman and Navidi: Should We Have Adjusted the Census of 1980? 


(OLS 11) PEP 2-9 = —2.23 + .079 min + .036 crime + .028 conv + residual 
(— 4.0) (5.4) (3.6) (3.5) 


fms Tesidualt= 53. 


(OLS 12) PEP 2-9 = .120 min + .026 crime + .029 conv — .176 pov + residual 
(7.6) (3.4) (3.8) (-4.4) 


rms residual = 1.49. 


(GLS 11) PEP 2-9 = —3.37 + .054 min + .061 crime + .026 conv + residual 
(— 6.0) (3.6) (5.4) (5.0) 


rms residual = 1.60. 


(GLS 12) PEP 2-9 = .118 min + .030 crime + .031 conv — .217 pov + residual 
(7.3) (4.1) (5.2) (—5.4) 


rms residual = 1.53. 


Min is the percentage of minorities; crime, the crime rate; conv, the percentage who were 
conventionally enumerated; pov, the percentage below the poverty line. 

As will be seen, the weights make little qualitative difference (although the difference in 
t-statistics is noticeable). Under either regime, pov is quite significant. And the equation 
involving pov is superior, for it has smaller residuals. 

The poorer an area is, the /ess its undercount will be. That is what equation (12) ‘‘shows’’; 
other variables (i.e., racial makeup, crime rate, method of census enumeration) controlled for 
by the regression. This is in some conflict with EK’s theory of the undercount, despite their 
ingenious argument on p. 56. 

The best equation satisfying EK’s current criteria is, in fact, equation (GLS 12). It does not 
have an intercept. If an intercept is required, the best equation is 


PEP 2-9 = 1.260 + 2.609 CC + .109 min + .0262 conv — .190 pov + residual 
(2.1) (2.9) (5:1) (4.1) (-—3.1) 


rms residual = 1.56. 


(CC is an indicator for central cities.) Thus, EK cannot have selected their variables quite the 
way they say they did. 

Again, pov comes in with a significant negative coefficient. Within a central city, there are 
only two variables: min and pov. The equation says that among minority neighborhoods, the 
poorer they are, the easier they are to count. 

(Equation (2) in EKT is a different GLS regression, with covariance matrix s*I + K rather 
than K; s” is the estimated value of 0”, and K is the sample-based covariance matrix of the raw 
undercounts. See equations (1-6) in our paper.) 

Our point (p. 15) was that EK could not infer the model from the data; the switch to GLS does 
not really help them. EK say (p. 57), 


‘*The real issue is how much the actual estimates obtained from the different regression 
equations vary. The answer to that, as we have shown above, is that the undercounts 
do not differ substantially.’’ 


That identifies one real issue, out of many. (Another is the impact of variable selection on 
nominal variances; see Fay 1992). However, if EK are returning to their position of 1986, that 
they can adjust subareas, then variable selection will matter: 
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Table C 


RMS residuals from regression equations for PEP 2-9 and PEP 10-8. 
Explanatory variables include percent minority, percent conventionally enumerated, 
and either the crime rate or the percent urban. 


Ordinary Least Squares Generalized Least Squares 
Crime Percent Crime Percent 
Rate Urban Rate Urban 
PEP 2-9 1253 1.54 1.60 Say, 


PEP 10-8 | hag jake) ingehed | Rape) 


‘*For the 66 areas in the study, the choice of variables has some impact on the adjustments, 
but not a major one since both sets of variables span essentially the same column space. 
On the other hand, when extrapolating to subareas, the choice of variables matters a lot. 
[F and N, p. 9]. 


We turn next to Table 7, and recompute it using GLS. As Table C shows, for GLS as well 
as OLS, percent urban is a better variable than the crime rate; and PEP 10-8 is better than PEP 
2-9. EK’s reasons for excluding 10-8 do not survive inspection. 


The simulation in Table 9 comes out very much the same way, whether you select the 
variables by OLS or GLS. Table D repeats the simulation in Table 10, fitting by GLS. EK are 
right: urb comes ina little less often, CC noticeably more often. Still, urb beats three of EK’s 
variables (if by a whisker, in the case of MU). Furthermore, negative signs are hardly 
uncommon in the GLS runs, with paradoxical consequences noted above. 


Table D 


A simulation experiment on variable selection. 

PEP 2-9 is taken as ‘‘truth’’; percent urban (Urb) is permitted as an explanatory variable. 
The table shows the number of times each variable is entered, and the average of its coefficient 
(over the times it enters); 100 data sets were generated. In both regimes, coefficients 
must be significant; with GLS, negative values are permitted. 


Ordinary Least Squares Generalized Least Squares 
Variable No. of Times Average No. of Times Average 
Entered Coefficient Entered Coefficient 
CC 17 2.954 34 2.922 
Min 82 0.071 92 0.084 
Crime 53 0.053 40 0.055 
Conv 93 0.028 94 0.028 
Ed 5 0.085 11 — 0.099 
Pov 1 0.135 25 —0.212 
Lang 17 0,315 5 0.417 
MU 0 okt 18 — 0.048 
Urb jo, 0.060 19 0.053 


Notes: CC is an indicator for central cities; Min, the percentage of minorities; Crime, the crime rate; Conv, the 
percentage who were conventionally enumerated; Ed, the percentage with no high school degree; Pov, the 
percentage below the poverty line; Lang, the percentage who have difficulty with English; MU, the percentage 
living in multiple-unit housing. 
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Final Comments 
EK say (p. 58), 


‘‘Freedman and Navidi hold the adjustment data to a higher standard than the unad- 
justed data. They take on faith, and contrary to decades of Census Bureau evidence, that 
the unadjusted data are accurate, and they do not seem to be concerned with an evident 
pattern of bias across areas.’’ 


That is wrong on all counts. Our article begins with a discussion of errors in the census, 
their variation across areas, and the resource implications. However, we think that the censuses 
of 1980 and 1990, with overall accuracy estimated in the range 98% to 99%, were considerable 
achievements. Management skills have been learned from two centuries of experience, and there 
was dedicated work by hundreds of thousands of ordinary citizens. These censuses were not 
perfect, but they were very good of their kind. 

Ericksen and Kadane have a novel statistical method which, they say, will improve on the 
census. Our response is this. Show us. Show us not by the standards of physics on the one hand 
or ESP research on the other, but by the standards of rational argument. Two court cases and 
countless journal articles later, we find that Ericksen and Kadane cannot make the argument. 
But readers will judge for themselves. 
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REML Estimation in Empirical Bayes Smoothing 
of Census Undercount 
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ABSTRACT 


One way to assess the undercount at subnational levels (e.g. the state level) is to obtain sample data from 
a post-enumeration survey, and then smooth those data based on a linear model of explanatory variables. 
The relative importance of sampling-error variances to corresponding model-error variances determines 
the amount of smoothing. Maximum likelihood estimation can lead to oversmoothing, so making the 
assessment of undercount over-reliant on the linear model. Restricted maximum likelihood (REML) 
estimators do not suffer from this drawback. Empirical Bayes prediction of undercount based on REML 
will be presented in this article, and will be compared to maximum likelihood and a method of moments 
by both simulation and example. Large-sample distributional properties of the REML estimators allow 
accurate mean squared prediction errors of the REML-based smoothers to be computed. 


KEY WORDS: Linear model; Maximum likelihood; Restricted maximum likelihood; Variance 
components. 


1. INTRODUCTION 


Although a census attempts to carry out a complete enumeration of the population, for 
various reasons the final tallies are inaccurate. Census personnel, from its director down to 
the thousands of temporary enumerators, are part of a mammoth task whose accuracy relies 
on everyone doing their jobs to perfection. 

Moreover, events that are beyond human control (e.g. weather, natural disaster) must stay 
within expected limits. Clearly, in a country the size of the U.S.A. (in terms of both population 
and geography), many opportunities arise to give an imperfect census count. But size is not 
the only problem; heterogeneity of both population and geography gives a differentially 
imperfect count. 

The inaccuracies are typically expressed in terms of undercount, so that a negative value 
implies an overcount. Suppose the U.S.A. is divided into i = 1, ..., m areas (e.g. states, 
including Washington DC). In the i-th area, let 7; be the true (unknown) count and C; be the 
census count. Then the undercount, expressed as a percentage of the true count, is defined as, 


The problem of differential undercount is a serious one when census counts are used to 
apportion political power and revenue to areas and subareas. (Further discussion of these issues 
can be found in Ericksen and Kadane 1985, Freedman and Navidi 1986 and Cressie 1988). States 
like California, Texas, and New York would gain much from adjusting for undercount, i.e. 
from replacing C; with F;C;, where F; is an adjustment factor. 


The correct adjustment to use is, 


F, = T/C, (1.2) 


1 Noel Cressie, Department of Statistics, lowa State University, Ames, IA, U.S.A. 50011. 
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which is related to undercount by, 
F, = {1 — U,/100} 7}. 


As it stands, (1.2) is not helpful for adjustment, since the true count 7; is unknown. To obtain 
extra information that will allow F; to be estimated, the U. S. Census Bureau conducts a post- 
enumeration survey (PES) that determines whether people in the PES were or were not counted 
in the census (e.g. Wolter 1986). The survey consists of several hundred thousand households, 
yielding ‘‘raw’’ adjustment factors {Y;:i = 1, ..., m} that are in need of smoothing. 


Assume that, given F;, 


Y; ~ Gau(F,,6?), (3) 


i.e. Y; has, conditional on F;, a Gaussian distribution with mean F; and variance 6?. Adding 
the further assumption of independence, one obtains, 


Y ~ Gau(F,A), (1.4) 
wherey = CY), ar, 1.) pr = cr), -- 352 ,), and & Is Wen x A Clavonal Mlarki, 
diagtde. 2:02). 

Now assume that, 
F ~ Gau(X6,T'(77)), (1.5) 


where X is ann X p matrix of explanatory variables, G is ap x 1 vector of (unknown) 
coefficients of the linear model, (77) is ann x n diagonal matrix: 


['(7r?) = 77D (1.6) 


and D = diag{1/C,, ..., 1/C,,}. The heteroskedastic model (1.5) and (1.6) is discussed at 
considerable length in Cressie (1990). It is intuitively sensible that the adjustment factor, for 
an area whose population is large, has a smaller variance; Cressie (1989) provides both a 
Bayesian and a frequentist justification for this intuition. 


Another way to write the model (1.4) and (1.5) is: 
Y=XB+y+e, (1.7) 


where the 7 x 1 vectors y and « are statistically independent, y ~ Gau(0,I'(r’)), and 
€ ~ Gau(0,A). Now, assuming that 67, ..., 62 are calculated using sampling-variance for- 
mulas appropriate for the PES sampling frame, the only parameters left to estimate are 8 and 
7’. Thus, the two variance components A and I'(r”) only contribute one unknown parameter, 
namely 7*. It is worth noting that the methods developed in this article can be easily 
generalized beyond this simple variance-components problem. The general linear model is 
considered in Section 3. 

In Section 2, the Bayes predictor and the empirical Bayes predictor of F will be given. 
Estimation of @ is straightforward, but there are several possible ways 77 could be estimated. 
Section 3 presents maximum likelihood (m.1.), method-of-moments, and restricted maximum 
likelihood (REML) approaches. The effect of estimation of 7, on mean squared prediction 
errors, is investigated in Section 4. Section 5 compares the approaches by simulation and by 
example, and Section 6 presents conclusions and a discussion. 
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2. EMPIRICAL BAYES PREDICTION 


In this article, the true population of any small area is considered to be unknown. After 
observing the corresponding census population, the uncertainties about the true population 
are updated. Therefore, statistical models for undercount are conditional on the observed census 
counts. The model (1.4), (1.5), and (1.6) has been introduced in Section 1, and will be assumed 
throughout Sections 2, 3, and 4. 

Using a matrix analogue of squared-error loss, the optimal predictor is E(F'| Y) (Cressie 
1990), which is, 


Ope ry (AmG eer hl (aA (7-)) aaj x8 (2.1) 


and the mean-squared-prediction-error matrix is, 
E{(F — p*(Q))(F -— p*®))’) = (2 - Pr?) (4 + T(r’) P72’). (2.2) 


For the loss matrix, L(Fip) = (F — p)(F — p)’, (2.1) is easily seen to be a Bayes predictor 
of F. In reality, @ and 7” are unknown and so (2. 1) is not a statistic (i.e. 1s not a function only 
of the data). The proper Bayesian approach would be to put further priors and hyperpriors 
on all unknown parameters. (This solution to the conundrum of unknown parameters is 
sometimes called hierarchical Bayes, and demands a prior knowledge of process variability 
that many scientists do not feel they have. Nevertheless, noninformative priors and hyperpriors, 
particularly, often yield sensible estimators.) Often the posterior distributions are analytically 
intractable. Should the model and prior be specified according to their conditional distribu- 
tions, the Gibbs sampler could be used to obtain, numerically, all required marginal and joint 
distributions (e.g. Gelfand and Smith 1990). 

An alternative approach, the one taken in this article, is to treat all parameters, except F, 
as fixed but unknown, and to use the dataY to estimate them. This approach is called empirical 
Bayes. Although a parametric (conjugate) prior is assumed in this article, one could also work 
with a nonparametric prior (e.g. Laird and Louis 1987). 

Suppose now that @ is unknown, but that 7’ in (1.6) is (for the moment) known. Again, 
using the matrix analogue of squared-error loss, the optimal linear unbiased predictor is 
obtained by substituting the generalized-least-squares estimator: 


C= Anes) Xx) ACA (a 
into (2.1), yielding 


POG Ne— Oe Ae Ga) ie here VGA Ez 7)) 2) 
le S eae y 1 apts. GY aw Gl Cpl as bn a ae = oa 9 (2.3) 


(Cressie 1990). The mean-squared-prediction-error matrix is, 


M,(7*) = E{ (F — p(¥377)) (F — pW 77))} 
= A(r*)AA(7”)’ + (A(7?) — DI(77)(A(7?) — TD)’. (2.4) 


More realistically, 7* is also unknown. An empirical Bayes predictor is obtained by 
substituting an estimator 7” into A(r’) to yield, 
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B(Y; 77) = A(#)Y. (2.5) 


It is easy to see that when 7? is the maximum likelihood estimator of 77, then (2.5) is the 
maximum likelihood estimator of the Bayes predictor. 

The predictor (2.5) was suggested by Ericksen and Kadane (1985) (and criticized by Freedman 
and Navidi 1986). Incidentally, the form of their predictors may look different to (2.1), 
(2.3), and (2.5), but they are in fact identical upon using the identity: A(A + B)~'B = 
(A~! + B~!)~!, where A and B are square matrices such that A, B, and A + B have 
inverses. 

By substituting 7? into (2.4), an estimator of the mean-squared-prediction-error matrix: 


M, (7?) = A(#?)AA(#7)’ + (A(#7) — DI (#7) (A(#?) — TD’ (2.6) 


is obtained. Since (2.6) does not take into account the estimation of 7? in H(Y; 77), it is likely 
to be a biased estimator of E{ (F — p(Y; #*)) (F — A(Y; 7*))’}. Further discussion of this 
important issue is given in Section 4. i 

Having obtained 8 and 77, model diagnostics can be computed to check the fit of 
the estimated model. For example, a quantile-quantile plot, of the standardized residuals 
(A GEE77) oie 0S XB) against expected order statistics from a unit Gaussian distribu- 
tion, was used to show no obvious lack of fit of the model used in Section 5. A more complete 
discussion of model diagnostics is given in Section 6. 


3. ESTIMATION OF VARIANCE-MATRIX PARAMETERS 


In this section, the general linear model, 


Y ~ Gau(XB, 5 (y)), (3.1) 


will be assumed, wherey isa k x 1 vector of variance-matrix parameters. In particular, the 
model given by (1.4), (1.5), and (1.6) yields, 


Vy) =A+T(7’), (3.2) 
Z 


where y consists of only one parameter, 7°. 


For y known, estimation of @ is straightforward: 


i> 


ig Wh ©, SAD id OD pare, Fae COG Raat @ (3.3) 


More realistically, Y is unknown and has to be estimated; substitution of that estimator into 
(3.3) then yields an estimated generalized least squares estimator of @. In the rest of this section, 
three different methods of estimating y will be considered. 


3.1 Maximum Likelihood Estimation 


The negative log likelihood of @ and y is: 


L(B,y) = (n/2)log(2r) + (%)log(| ¥ Cy) |) + 
(A) (¥ — XB)’ S (y) “1 — XB). (3.4) 
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Minimization of this function yields maximum likelihood (m.1.) estimates 8,,. and Yn. The 
difficult part of this minimization involves finding 7,,,;. The Gauss-Newton (scoring) algorithm 
is given inter alia by Harville (1977) and Mardia and Marshall (1984) and is repeated here for 
notational completeness. 


Define, 


Lily) = 0Y Gy) /oy;, 8 = 1, ..., &, 
(3.5) 


Il 


Diy) = aL Mn = - DMILYI GY) bight. k& 


the k x 1 vector L, to have i-th element: 
(Ly) = (A)t(E (Y) TL) + CZ) WY — XB) Li - XB), (3.6) 
and the k x k matrix J, to have (i,/)-th element: 
(Jim 42) ), deokus (Y) alaaea de (We (3.7) 
Then, 
CF = 7 AEGON) AEC, (3.8) 
where J<” and L. denotes J, and L,, respectively, evaluated at y = y“” and @ = By“ Ay 


When y consists of only 7” in (1.6), the algorithm (3.8) is particularly straightforward. In 
the simulations and example given in Section 5, the starting value 
Care SN Clint Sep Ye ECE D EXD AS DYED 
(OX De xX Diy). (3.9) 


was used. Then (3.8) is, 


coy ere 2S LC Fr) Le ON, 2.5 GB.10) 


i=] 


where 


Pe AS (Cop a)? 


i=1 


— (ALY — XB( (77) )} ‘diag {Cy (Cyd? + (77) ©) — XB((7?) ) J. BAAD 


Iterating (3.8) to convergence yields the m.1. estimator 4,7, which upon substitution into 
(3.3) yields the m.1. estimator BVime)« Under appropriate regularity conditions (e.g. Mardia 
and Marshall 1984) (B(4me)’s Vin) ‘ is approximately multivariate Gaussian, with mean 
(8, > )’ and asymptotic variance matrix, 
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(XG yar 0 
; (3082) 
0 a Pe 
when Y consists of only 7” in (1.6), the matrix (3.12) becomes, 
(Ce ae) eek 0 
n =r Oe (3-13) 
0 fw NRERRATER EE. 73] 


i=1 


In practice, estimated variances and covariances are obtained by evaluating (3.12) at the m.1. 
estimate Vmn~. 
3.2 Method-of-Moments Estimation 


There is no single method-of-moments estimator of y, but the general idea is to match low- 
order moments of data with corresponding empirical moments. If only first- and second-order 
moments are used, it is clear that the Gaussian assumption in (3.1) is not needed. 

Let Ubea positive-definite symmetric matrix. Consider the weighted regression estimator, 
By = (X’'U~!X) —|1X'U~'Y, and the weighted residuals, 


enna CB Sy AOL aX Fe Xe (3.14) 
Then, straightforward matrix algebra shows that, 
E(eyeu) = tr(E(y)My), (3.15) 


where II = U~! — U~*X(X’U~!X) ~*X’U~!, Assuming that L(y) = A+ y0, +... + 
y,U';., where I;’s are known, one obtains, 


> 


y) vitr(Piy) = Eeveu) — tr(Aly). 


Choice of k different U;; j = 1, ..., k (e.g. Uj, Uj, ..., Uf) yields k equations in k 
unknowns: 

k 

a r(Tjly,) = eu,eu, — tr(Ally,); j = 1, ...,%, (3.16) 
which can be solved for 4;, ..., 7. It is important to check that the solution ¥ is in the 


parameter space ly: yes Wi is positive-definite }. 
When Y consists of only 7? in (1.6), only one marericn U in (3.16) is needed. Previous under- 
count predictors have based their estimate of 77 on U = J (Ericksen and Kadane 1985; 
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Freedman and Navidi 1986; Ericksen, Kadane and Tukey 1989), but a small sensitivity study 
for the heteroskedastic model (1.6) suggested a better estimator. 

Choose U, = A + I'(q) in (3.15) to mimic the model (1.7). Then, when w = 7? (the true 
value), Fay and Herriot (1979) show that 


E(eu,eu,) =" — DP, (3.17) 


where v is the number of areas, p is the number of regressors in the matrix X (e.g. p = 3 for 
the selected model in Section 5), and ey is the standardized residual defined by (3.14). Thus, 
the proposed method-of-moments estimator of 7? is the value of a for which 


ey, ey = Nn = Pp, (3.18) 


which can be solved using a Newton-Raphson iterative method or a simple bisection method; 
call the resulting estimator 72,,,. 

Fay and Herriot (1979) note that the difference between 72,,, and 72,,is manifest in how an 
area with small 6? is weighted in the estimation procedure; 7? gives relatively more weight to 
the squared residuals for such an area than does 72,,,. Based on this weighting property, and 
a small simulation study of bias, Cressie (1990) expressed a preference for 72,,, Over 74). 
However, asymptotically, 77,.is fully efficient and has an accessible distribution theory. Lack 
of any (asymptotic) distributional results for 72,,, causes its own set of problems, such as how 
to make inference on 7”, and how to carry out mean-squared- prediction-error corrections in 
Section 4. A more satisfactory estimator, with better bias properties than the m.1. estimator, 
is developed below. 


3.3. Restricted Maximum Likelihood Estimation 


The problem is to find a suitable estimator of the variance-matrix parameters y in (3.1). The 
method of restricted maximum likelihood (REML), developed originally by Patterson and 
Thompson (1971, 1974), applies maximum likelihood to error contrasts rather than to the data 
themselves. (Rao (1979) calls this method MML, marginal maximum likelihood, in the context 
of estimation of variance components. Recently, some authors have also called it residual 
maximum likelihood, although they have retained the abbreviation REML.) A linear combina- 
tiong’Y is called an error contrast if E(@’Y) = 0, for all 8 andy; thus, a’Y is an error contrast 
if and only ifa’/X =0’. : 

Let W = A’Yrepresent a vector of (n — p) linearly independent error contrasts; i.e. the 
(nm — p) columns of A are linearly independent and A’ X = 0. Under the Gaussian assumption 
(3.1), W ~ Gau(0, A’ ¥ (y)A), which does not depend on 8. Thus, the negative log likelihood 
function is, i 


Ly(y) = ((n — p)/2)log(2m) + (%)log(| A’E (y)A|) + 
(A)W'(A'Y (y)A)~'W. 

If another set of (n — p) linearly independent contrasts were used to define W, the new 
negative log likelihood function would differ from Ly(y) only by an additive constant 
(Harville 1974). Indeed, for the A that satisfies 4A’ = I — X(X’X) 'X’ (and A’A =J), 

Lyw(y) = ((n — p)/2)log(2x) — (‘4)log(| XX] ) + (4)log(| ¥ ty) |) + 
(Aylog (1X? a) X |) + (4) Y UG)Y, (3.19) 
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where II(y) = Y (y) 7! — ¥ (y) 1X(X'Y (y) 71 X) 71 XY (y) 45 see Harville (1974). 
A REML estimate of y denoted 4 Vres is obtained by minimizing (3.19) with respect toy. The 
distinction between REML and m.1. estimation becomes important when pis large relative to n. 
The REML method was originally proposed to estimate variance-component parameters: 
Numerical algorithms (Harville 1977), robust adaptations (Fellner 1986), and distribution theory 
(Cressie and Lahiri 1991) have been developed in this context. Kitanidis (1983) and Zimmerman 
(1989) give computational details for producing an iterative minimization of (3.19). 

Harville (1974) provides a Bayesian justification for REML by assuming a noninformative 
prior for 8, which is statistically independent of y, and showing that the marginal posterior 
density of y is proportional to (3.19) multiplied by the prior for y. When that prior is nonin- 
formative, REML estimates correspond to marginal MAP (maximum a posteriori) estimates. 
Thus, in the situation where noninformative prior distributions for@ andy are independent, 
REML can be seen as a compromise between m.]. and Bayes estimation with squared error 
loss. In the case of model (1.4), (1.5) and (1.6), the latter would yield a Bayes estimate, 
|S 7’exp{ — Ly(r’) }dr’, which can be obtained equivalently by averaging 77, weighted by 
the full likelihood, exp{ —L (8, 7”) }. On the other hand, m.]. yields as an estimate of 7’ the 
value 7+, obtained by maximizing the full likelihood. REML averages the full likelihood over 
@ but maximizes the resulting (restricted) likelihood over Tr. 

Maximum likelihood estimation of 7” tends to be biased towards zero because the 
likelihood, as a function of 7’, is skewed to the right. When normalized to integrate to one, 
the mean of such a function is generally larger than its mode (e.g. Groeneveld and Meeden 
1977). The m.]. estimate is based on the profile of the likelihood surface of 8 and 7’, and this 
favors smaller values of 77. (In contrast, REML is obtained by first integrating the likelihood 
over @ and then maximizing the result over 7”. Notice that Bayesians might advocate further 
integration over 77.) 

Although the Bayesian interpretation of REML helps to explain its properties, 7,,also has 
the obvious frequentist interpretation of being an estimator based on restricted information. 

Minimization of (3.19) with respect to y can proceed by any of the gradient algorithms. 
Recall, ; 

W = ALY (3.20) 


~ 


and suppose A satisfies: 
AAT aX GX BX) ae nant AeA 7 


For the moment, focus all attention on the (n — p) ‘‘data’’ W; their joint distribution depends 
only ony, and the associated negative log (restricted) likelihood is Ly(y) given by (3.19). 


Define the k x 1 vector M, to have i-th element: 
(M,); = Ly(y)/dy = (AMY) Lily} — CAYUDSTDUMY,  G.21) 


and the kK x k matrix G, to have (i, /)-th element: 


(G,)i; = EPL yy) /dyidy) = CAH (y) Ei y) E,;) 3, (3.22) 


where II (y) is given below (3.19) and Y i(y) is defined by (3.5). (The expressions (3.21) and 
(3.22) were obtained by Harville 1977.) Then, the Gauss-Newton (scoring) algorithm to find 


Vre is: 
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ft) =) (GiOber WA. (3.23) 


where G‘” and M: denote G, and M. 


. e 
M,, respectively, evaluated aty = ance ) 


When y consists of only 7” in (1.6), the algorithm (3.23) is particularly straightforward. In 
the simulations and example given in Section 5, the starting value (3.9) was used. Then (3.23) is, 


re) PE sr AD GH ARS (3.24) 
where 
M, = (%)tr{II(7*)D} — (A) Y'I(7?)DU(7’)Y, (3.25) 
G, = (4)tr{II(7*) DI (77)D}, (3.26) 
LL (Ge) OE es ok (77) SG (77) GA) AI D(a?) oH, (3.27) 


are evaluated at r* = (7), Also, recall that (77) = A + r*Dand D = diag{1/C,, ..., 
L7G}. 

Iterating (3.23) to convergence yields the REML estimator Yre. It has been proved by 
Cressie and Lahiri (1991) that 4 Vre is approximately multivariate Gaussian, with mean y and 
asymptotic variance matrix, 


Gwe (3.28) 


Y 


When y consists of only 7’ in (1.6), the matrix (3.28) becomes a scalar, 
[ (4) tr (1 (77). DI (7?7)D}] 71. (3.29) 
In practice, estimated variances and covariances are obtained by evaluating (3. 28) aty = Yre- 


Furthermore, the normalized (estimated) generalized least squares estimator, B (Fre) should 
be approximately Gaussian with asymptotic variance matrix, (X’2 (y)X) is 


4. IMPROVED ESTIMATION OF MEAN SQUARED 
PREDICTION ERRORS 


In what is to follow, I shall be concerned with the effect, on prediction, of estimation of 
y in u(y) given by (3.1). Generalizing (1.5) to, 


fi ~ Gau( xp, '(y)), (4.1) 

it is clear that 
Ly) =A+T). (4.2) 
In principle, A could also depend on unknown parameters (in, e.g. a model for sampling 


variances) and the results of this section are equally applicable. The optimal linear unbiased 
predictor is, 
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IO) = TOMAR QTY Tey aera) ee 


X{X' (A +P (y)) VX TTX (A+ Py) TY = AW)Y. (4.3) 
Then, the mean-squared-prediction-error matrix of D(Y; 1) denoted M, (y), is given by, 
Mi(y) = A(y)AA(y)’ + (AW) — DPQ) (AQ) - 1)’. (4.4) 


In reality, y is unknown and has to be cotimiated by ¥, say. The empirical Bayes predictor 
of Fis then pi Y;7). given by (4.3) withy = ¥. In this case, M, (y) is an inappropriate measure 
of the predictor’ S precision; one should use sSNA) 


My(y) = E((F — p34) (F — 6W39))}. (4.5) 


It is the risk matrix (4.5), or an estimate of it, that should be given, along with the predictor 
BY; Y). However, M; (7) is typically reported; hence, one should ask what inaccuracies result 
from using M,(¥) and ‘whether a more appropriate estimator of M)(y) is available. 

Now, under the assumptions (4.1) and (4.2) (Gaussianity is important here) and provided 
¥ is an even and translation invariant function of the data, the results of Harville (1985) can 
be used to establish that M2(y) — M,(y) is non-negative-definite. (An estimator is even if 
VOY) =7 C=. ¥) and is translation invariant lig ts és ee =7( Y), foranyp X 1 vectord).) 
When y consists of only 7” in (1.6), the estimators 72,9, 72m and 7 72, are all even and transla- 
tion invariant. Intuitively, estimation of the unknown parameters y leads to larger mean squared 
prediction errors; the result above quantifies this intuition. - 

But, there is another potential source of bias due to the fact that M, (7), not M; (vy), is 
used to estimate the risk matrix. Suppose that ¥ is chosen to yield an unbiased estimator of 
the variance matrix of (Y’, F’)’, which most would agree is a desirable property. Then the 
results of Eaton (1985) and Zimmerman and Cressie (1991) can be used to establish that 
M,(y) — E(M,(7)) is non-negative-definite. (The proof relies on a multivariate version of 
Jensen’s inequality and on the fact that D(X; y) , which can be written as A(y) Y, minimizes 
the risk matrix over all linear unbiased predictors.) 9 


Upon writing, 
My(y) — Mi (4) = (Moly) — My(y)} + (Mi) -— E(M.(4))} + 
(E(Mi(4)) — M,(4)}, (4.6) 


the results above establish that underestimation of M(y) comes from two sources. Even if 
an expression for M, (y) were known, it is likely that M, (7) would be biased for M, (y), 
further illustrating the inherent difficulty in estimating mean squared prediction errors. 

A remedy has been suggested by Prasad and Rao (1990), based on asymptotic expansions 
of M>(y). Consider prediction of undercount in the /-th area, and let [M, (y) Jisand [M, (7) Ji 
denote the (7,/)-th elements of the risk matrices M, (y) and M, (¥); respectively. Then formal 
application of Prasad and Rao’s proposal yields the estimator of [M2 (y) Jiis 


[Mo(y) i = (UM) ]ie + 2tr{An)BY)3; 0 = 1, ..., 7. (4.7) 
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In (4.7), Ajj(y) sak x k matrix given by, 
Aly) = var{dp\(¥3 y)/dy} (4.8) 
and B(y) is a matrix that equals or approximates the k x k matrix, 
B= y) Ga (4.9) 
For m.l]. estimation, 
Biy) = Jy’, (4.10) 
where J, is given by (3.7), and for REML estimation, 
B(y) = G,’, (4.11) 


where G, is given by (3.22). 


Kass and Steffey (1989) give approximations (to the conditional variance) that are similar 
in spirit to (4.7), for probability distributions that are not necessarily Gaussian. However, their 
approach requires independent replications, which is not a feature of the distributions specified 
by (3.1). 

Should small areas be aggregated, it is important to have an approximately unbiased 
estimator of all elements of M, (y). It is not difficult to generalize (4.7) to, 


My] = IM) + 24 (MBM Isi7 = 1, ..., 0, 


where Ajj(y) = cov {dp,(Y; y)/0y, Op, (Y; Y)LOms: Prasad and Rao (1990) show that, to the 
same order of magnitude, Aj; (y) can be replaced by cov{dp7 (Y)/dy, dp} (Y)/dy}, where 
p* (Y) is given by (2.1); these latter derivatives can be simpler to calculate. 

When y consists of only 7” in (1.6), calculation of B(y) is straightforward; see (3.13) and 
(3.26). Now, consider 


var (A6(Y377)/d7?) = (@A(7?)/77)Z (77) (GA(7”) /d77)’, (4.12) 
where A(77) is given by (2.3). In terms of II(r”) defined by (3.27), and A defined by (1.4), 
A(r?) =I — ATl(7’). (4.13) 


Thus, (4.12) can be calculated from (4.13) using the relationships (3.4) and (3.5). Then, 
A,;(77) given by (4.8) is the (i,/)-th element of, 


A (AI (r*) /8r7)E (77) (AL (77) /d77) A’, (4.14) 
where 
a (27) /87? = = Ur) DET XCEL yxy eae y= 
EU) OX(X Lr?) eye Lr?) DE (77) 2X} 
(PLANTAE. OTR ONC on eae DH CASED. de. 4 OC) ee. @ et 
DCD Ges gl OD Gi eS (4.15) 


recall that D(7*) = A + 77D, and D = diet 1/G) eae, very 
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The estimator of mean squared prediction error, [M>(77)]#, is conjectured to be approx- 
imately unbiased (Prasad and Rao’s 1990, results were obtained for a more specific model than 
is considered here). It is obtained by bringing together the relations (4.7), (4.14) and (4.10) for 
m.l. estimation, or (4.7), (4.14) and (4.11) for REML estimation. This estimator will be com- 
pared to the often-reported estimator [M, (77) ] ii, N Section 5, using 1980 U. S. Census and 
Post Enumeration Survey data. 


5. A COMPARISON OF ESTIMATORS BY EXAMPLE 
AND BY SIMULATION 


5.1 Example 


The PEP 3-8 data from the 1980 Post Enumeration Survey, for then = 51 states of the USA 
(including Washington, DC) are used to illustrate the empirical Bayes approach. These data 
are presented in Cressie (1989, Table 1, ‘‘Total’’ columns) and the variances 57, ..., 63, in 
(1.3) are obtained from Cressie’s ‘‘Total’’ column labeled MSE ” (whose squared entries will 
be denoted MSE, ..., MSEs,). Using the relation F; = {1 — U,;/100} ~' and the 6-method, 
5? = (Y;)*(MSE;)/104. Eight explanatory variables, given by Ericksen, Kadane and Tukey 
(1989), were collapsed to the 51 states (from 66 small areas that included cities, rest of states 
and states). The explanatory variables are: 


1. Minority percentage. 
2nCrime.rate, 
3. Poverty percentage. 
4. Percentage with language difficulty. 
5. Education. 
6. Housing. 
7. Proportion of population in any of 16 prespecified central cities. 
8. Percentage conventionally counted in the census. 


To find a subset of these variables that provides a good model for undercount, I used the 
selection method of Ericksen, Kadane and Tukey (1989), but weighted the data proportionally 
to the square roots of the small areas’ census counts. The variables selected were 1 (minority) 
and 5 (education), as well as the constant term. Henceforth, in this paper, these three variables 
will be the only ones considered in the linear model; 7.e. only regression coefficients G9, 6, and 
Bs will be fit. 

Under the model (1.4), (1.5) and (1.6), the unknown parameters are 6 and 7’. From the 
scoring algorithm (3.8), the m.1. estimate of 7? is: 


Top 4a 
while from the scoring algorithm (3.23), the REML estimate of 7? is: 

Tig + DO. 5 oe 
This illustrates a phenomenon observed from the realizations of a simulation presented below, 
namely, that 77,. < 72); an intuitive explanation is given in Section 3.3. (Parenthetically, 


Cressie (1990), obtained 72,,,, = 94.96, but no general inequality between it, m.1., and REML 
is apparent.) 
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From the formulas in Section 3, the following estimates (with estimated standard errors in 
parentheses) were obtained: 


m.1. REML 
By = 1.03227 (0.00708) By = 1.03246 (0.00724) 
B,; = 0.0006878 (0.0001402) B; = 0.0006941 (0.0001436) 
Bs = —0.001070 (0.000231) Bs = —0.001078 (0.000236) 
#71=947.32 (32.87) @? = 58-53°0381).. 


Notice that there is very little difference between the two sets of estimates, except for that of 
7°. Upon using the m.]. and REML estimates in ,(Y; #7) given by (2.5), [M, (#2) ], given by 
(2.6), and [M,(7)]# given by (4.7); i =1, ..., n, small-area predictors and estimated root 
mean squared prediction errors are obtained. Table 1 shows the results for then = 51 states; 
also shown in the table are the raw undercount data Y;, the fitted linear model (X@);, and 
the weight, 


w; = #°/(C;6? + 77), (5.1) 
such that 
DAY: 7) = wy, + (1 —wytXe)pbal...., 51: (5.2) 


Notice that w; for REML is consistently larger than w, for m.]., which is intuitively sensible 
since 77,, has a notoriously large, negative bias. Thus, REML estimation of 7? results in less 
weight on the model term (X@);, but in a way so that the effect of estimation of 7? can be 
incorporated. 

It is interesting to notice that one pays a price for using REML; its root mean squared predic- 
tion errors are consistently larger. This is not surprising, since we know that (asymptotically) 
m.l. is 100% efficient. Further, notice that the improved root mean squared prediction error, 
J[M2(77) 1}, is between 1% and 9% larger than /[M, (#7) ] j. 

With regard to prediction, one can assess the importance of m.]. versus REML estimation 
of 7? by computing the weighted sum of squares, 


51 
Y) (Bist) — BYs77) )°Cy = 15. 
i 


When compared to, 
51 
iD (Y; —‘1)?¢€; = 70,421 
i=1 


and 


51 
Wy 1%: — Bsa) 1° = 26,033, 
i=] 


Table 1: Columns, from left to right, show the 51 states according to a three-letter identifier, 
their raw undercounts { Y;}, their model fits { (X@);}, their weights {w;} given by 
(5.1), their predictors (5.2) (headed F12), their root mean squared prediction errors 
{ |[M 1 (#7) ],;} (neaded RMPE1), and their improved root mean squared prediction 
errors {| [M5 (77) ]#} (headed RMPE2). Table is given over the page. 
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Table 1 
REML 
STATE ye 

MDLFT WGHT F12 RMPEl1 RMPE2 

ala 0.9965 1.0037 0.1431 1.0026 0.00439 0.00453 
aka 1.0288 LOLS 0.4767 1.0229 0.00896 0.00976 
arz 1.0204 1.0158 0.0742 1.0162 0.00487 0.00500 
ark 0.9895 0.9962 0.1398 0.9953 0.00541 0.00562 
cal 1.0307 [0225 0.0682 P0231 0.00322 0.00327 
col 1.0033 1.0199 0.1926 1.0167 0.00473 0.00495 
con 0.9886 1.0079 0.1029 1.0059 0.00435 0.00451 
del 0.9938 1.0107 0.4571 1.0030 0.00739 0.00811 
fla 1.0144 1.0120 0.0785 120122 0.00289 0.00295 
gga 0.9955 1.0046 0.1639 1.0031 0.00391 0.00403 
hai 1.0111 1.0105 0.2785 1.0107 0.00678 0.00730 
idh 1.0125 1.0070 0.5627 1.0101 0.00531 0.00579 
ill 1.0211 1.0103 0.1170 1.0116 0.00257 0.00265 
ind 0.9936 1.0026 0.1413 1.0013 0.00334 0.00349 
iow 0.9932 1.0033 0.1478 1.0018 0.00452 0.00475 
kan 1.0056 1.0092 0.2215 1.0084 0.00466 0.00496 
kty 0.9845 0.9872 0.1519 0.9868 0.00507 0.00524 
lou 1.0234 1.0086 0.0263 1.0090 0.00476 0.00480 
mne 1.0201 0.9992 0.3703 1.0069 0.00593 0.00645 
mld 1.0242 1.0140 0.0712 1.0147 0.00406 0.00415 
mas 0.9882 1.0068 0.1945 1.0032 0.00323 0.00341 
mch 1.0079 1.0081 0.1601 1.0081 0.00259 0.00271 
min 1.0111 1.0049 0.2793 1.0066 0.00359 0.00383 
mis 1.0097 1.0086 0.1279 1.0087 0.00557 0.00575 
mou 1.0080 1.0010 0.1681 1.0022 0.00350 0.00367 
mon 1.0144 1.0059 0.3785 1.0091 0.00699 0.00761 
neb 1.0008 1.0071 O54 7 1.0039 0.00441 0.00480 
nev 1.0265 1.0151 0.2852 1.0183 0.00744 0.00802 
nwh 0.9842 1.0033 0.3080 0.9974 0.00684 0.00740 
nwj 1.0130 1.0105 0.0895 1.0107 0.00305 0.00314 
nwm 1.0236 1.0256 0.3276 1.0249 0.00611 0.00648 
nwy 1.0166 1.0119 0.0807 1.0123 0.00243 0.00247 
noc 1.0118 0.9998 0.0748 1.0007 0.00421 0.00430 
nod 1.0005 0.9969 0.8931 1.0001 0.00313 0.00324 
oho 1.0108 1.0044 Ocb2 nS 1.0052 0.00253 0.00263 
okl 0.9977 1.0018 0.1625 1.0011 0.00429 0.00451 
ore 1.0027 1.0089 0.2833 1.0071 0.00434 0.00464 
pen 0.9972 1.0013 0.1475 1.0007 0.00253 0.00263 
rhi 1.0089 0.9939 0.4167 1.0001 0.00625 0.00678 
ele 1.0632 1.0040 0.0216 1.0053 0.00555 0.00559 
sod 1.0008 0.9985 0.7538 1.0002 0.00464 0.00496 
ten 0.9717 0.9966 0.0755 0.9947 0.00439 0.00449 
tex 1.0037 1.0149 0.0482 1.0144 0.00341 0.00345 
uth 1.0040 1.0142 0.4010 1.0101 0.00524 0.00563 
vmt 0.9889 1.0018 0.8232 0.9912 0.00454 0.00479 
vir 1.0009 1.0058 OT a53 1.0049 0.00338 0.00354 
was 1.0142 1.0121 0.1305 1.0123 0.00418 0.00434 
wev 0.9942 0.9877 0.1452 0.9887 0.00603 0.00628 
wis 1.0173 1.0032 0.2877 1.0073 0.00325 0.00348 
wyo 1.0361 1.0127 0.3992 1.0221 0.00882 0.00963 
del 1.0375 1.0474 0.2191 1.0452 0.01081 0.01125 
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Table 1 (concluded) 


ML 
STATE % 

MDLFT WGHT Pr RMPE1 RMPE2 
ala 0.9965 1.0037 0.1190 1.0028 0.00415 0.00427 
aka 1.0288 1.0175 0.4241 1.0223 0.00850 0.00933 
arZz 1.0204 1.0157 0.0608 1.0160 0.00448 0.00459 
ark 0.9895 0.9963 0.1161 0.9955 0.00506 0.00525 
cal 1.0307 1.0224 0.0559 1.0228 0.00314 0.00319 
col 1.0033 1.0198 0.1617 hor 0.00446 0.00466 
con 0.9886 1.0079 0.0849 1.0063 0.00398 0.00412 
del 0.9938 1.0107 0.4050 1.0039 0.00697 0.00771 
fla 1.0144 1.0120 0.0644 1.0121 0.00271 0.00276 
gga 0.9955 1.0046 0.1368 1.0034 0.00375 0.00385 
hai (OLE 1.0105 0.2378 1.0106 0.00629 0.00679 
idh 1.0125 1.0070 0.5099 1.0098 0.00507 0.00559 
ill 1.0201 1.0103 0.0967 1.0113 0.00242 0.00248 
ind 0.9936 1.0026 0.1174 1.0015 0.00309 0.00323 
iow 0.9932 1.0034 0.1230 1.0021 0.00418 0.00438 
kan 1.0056 1.0091 0.1870 1.0085 0.00432 0.00460 
kty 0.9845 0.9874 0.1264 0.9870 0.00486 0.00502 
lou 1.0234 1.0086 0.0214 1.0089 0.00446 0.00449 
mne 1.0201 0.9993 0.3222 1.0060 0.00557 0.00608 
mild 1.0242 1.0139 0.0583 1.0145 0.00376 0.00384 
mas 0.9882 1.0068 0.1634 1.0037 0.00302 0.00319 
mch 1.0079 1.0081 0.1335 1.0081 0.00242 0.00252 
min 1.0111 1.0049 0.2386 1.0064 0.00339 0.00362 
mis 1.0097 1.0085 0.1060 1.0087 0.00526 0.00541 
mou 1.0080 1.0011 0.1404 1.0021 0.00326 0.00341 
mon 1.0144 1.0059 0.3299 1.0087 0.00656 0.00717 
neb 1.0008 1.0071 0.4587 1.0042 0.00420 0.00461 
nev 1.0265 1.0150 0.2439 1.0178 0.00692 0.00746 
nwh 0.9842 1.0033 0.2646 0.9983 0.00637 0.00691 
nwj 1.0130 1.0105 0.0736 1.0106 0.00283 0.00290 
nwm 1.0236 1.0254 0.2826 1.0249 0.00582 0.00617 
nwy 1.0166 1.0119 0.0663 1.0122 0.00231 0.00235 
noc 1.0118 0.9998 0.0614 1.0005 0.00401 0.00408 
nod 1.0005 0.9970 0.8710 1.0000 0.00310 0.00324 
oho 1.0108 1.0045 0.1055 1.0051 0.00236 0.00245 
okl OITs 1.0018 0.1356 1.0013 0.00396 0.00416 
ore 1.0027 1.0088 0.2421 1.0074 0.00408 0.00436 
pen 0.9972 1.0014 0.1227 1.0008 0.00239 0.00248 
rhi 1.0089 0.9940 0.3660 0.9995 0.00591 0.00645 
soc 1.0632 1.0041 0.0176 1.0051 0.00519 0.00523 
sod 1.0008 0.9985 O22 1.0002 0.00452 0.00490 
ten 0:9717 0.9967 0.0619 0.9951 0.00413 0.00422 
tex 1.0037 1.0148 0.0393 1.0144 0.00329 0.00332 
uth 1.0040 1.0141 0:3512 1.0105 0.00498 0.00536 
vmt 0.9889 1.0019 0.7901 0.9916 0.00445 0.00477 
vir 1.0009 1.0058 0.1467 1.0051 0.00317 0.00330 
was 1.0142 1.0120 0.1082 1.0123 0.00391 0.00406 
wev 0.9942 0.9879 0.1207 0.9886 0.00567 0.00590 
wis 1.0173 1.0033 0.2461 1.0067 0.00306 0.00328 
wyo 1.0361 1.0127 0.3494 1.0209 0.00829 0.00909 


del 1.0375 1.0470 0.1849 1.0452 0.01036 0.01078 
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it is clear that, from a national perspective, prediction is not very sensitive to estimation methods 
for r?. (Cressie (1990) reaches the same conclusion based on a similar comparison of 77,, and 
7nm:-) However, from Table 1, it is equally clear that estimated root mean squared prediction 
errors are considerably more sensitive. 

Cressie (1990) gives expressions for the risks of adjusting using p(Y;r*) and of not 
adjusting. When 72, and @(72,) are substituted into those expressions, the risk of adjusting 
is 3,253, while the risk, of not adjusting is 34,134. That is, not adjusting leads to a 949% increase 
in risk (provided the model defined by (1.4), (1.5) and (1.6) holds). 


5.2 Simulation 


To check the asymptotic distribution theory of the REML (and m.1.) estimator of 77, a 
simulation was carried out on the linear model described in Section 5.1, with parameter values: 


Bo = 1.0330, 8, = 0.000712, Bs; = —0.000110, 7? = 95.00. (333) 


The simulation, 
Y ~ Gau(X@,-A + 77D); (5.4) 


where A is given by (1.4) the same values of 57, ..., 52, as used in Section 5.1 and Cressie 
in 1990, are used here and D is given by (1.6), was performed 500 times, and each time the 
estimates, 77,), 77,., and 72, were computed. (Whenever a negative value was obtained, the 
estimate was set equal to zero.) The stem-and-leaf plots of the three sets of estimates are 
presented in Figures la, 1b and Ic, respectively. Notice the relatively larger number of zeros 
for the m.]. estimates (Figure la). 


Figure 1. Stem-and-leaf plots of estimated variance parameter 77, based on 500 simulations 
of (5.4): (a) maximum likelihood (Section 3.1), (b) method-of-moments (Section 3.2) 
and (c) restricted maximum likelihood (Section 3.3). 
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The means (X) and standard deviations (S) of the distributions shown in Figure 1 are: 


#n we ih 
83,00 X = 96.85 A421; 
S = 45.65 S = 57.46 S = 49.17. 


The means should be compared to the true value of 77 = 95.00. The bias in 72, is apparent; 
72, has very little bias and has a small advantage over 7%,,,. With regard to standard deviations, 
the advantage of 72, over 72,,, is considerable, but it is at some disadvantage over ey. Or 
reasons explained in Section 3.3, that are not all statistical, bias is more of a concern than 
variance, and so REML estimation of 7” should be considered a serious alternative to m.1. 

Asymptotic distribution theory for m.]. and REML can be checked from the simulations. 
(The method of moments is at a disadvantage in that no asymptotic distribution theory is readily 
available.) Substituting 77 = 95.00 into (3.13) yields, 


© 
Nh 
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{var (72..)} 7% = 48.73, 
which should be compared to S = 45.65. Finally, substituting 77 = 95.00 into (3.29) yields, 
(vars) ) = 00.14, 


which should be compared to S = 49.17. 


The opportunity also exists to use the simulation to look at ‘‘actual’’ errors of prediction 
and to assess the performance of M,(77) and M,(r7)*. If the parameter values (5.3) were 
estimated from the original data, then this amounts to a parametric boostrap. 


6. CONCLUSIONS AND DISCUSSION 


Model-based prediction of undercount relies on careful checking of model fit. Diagnostic 
plots based on standardized residuals have already been suggested at the end of Section 2. The 
standardized BLUP residuals {Y; — 6; (Y; 77)}/{ [M(#’)]} 7; i = 1, ..., 2, also have a 
role to play. They could either be used in a quantile-quantile plot (e.g. Cressie 1991, p. 225) 
or, as suggested by Calvin and Sedransk (1991), plotted against A; (Y; #7); OW eee 

One could also extend the model (1.4) to include an unknown variance-component parameter 


z 


On. 


Y ~ Gau(F, o°A), (6.1) 
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where A = diag{7, ..., 62}. Upon fitting the more general model (6.1), (1.5) and (1.6), one 
could then test whether the REML estimate o?, is significantly different from o? = 1, which 
would provide a check on model misspecification. (In this case, REML estimation is recom- 
mended over m.]. estimation, since any bias will seriously affect inference on o”.) 

Restricted maximum likelihood (REML) estimation of variance-matrix parameters is less 
likely to lead to empirical Bayes predictors that put too much weight on the regression model 
(1.5). The price paid is slightly larger mean squared prediction errors. Using asymptotic dis- 
tribution theory for REML (which is checked by simulation), improved estimators of the mean 
squared prediction errors can also be obtained. Based on the model (1.4), (1.5) and (1.6), it 
can be concluded that there are accurate and precise ways to make inference on adjustment 
factors {F;:i = 1, ..., n}; the predictors {p;(Y;72,): i = 1,..., n} yield true-count and 
undercount predictors, 


Tae Dist) Gruand Ur = N00 oY te) \ ole fly 24 Te, 


respectively. Their biases and mean-squared prediction errors can be obtained using the 
6-method (cf. Cressie 1991, Section 3.2.2). 
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Hierarchical and Empirical Bayes Methods for Adjustment 
of Census Undercount: 
The 1988 Missouri Dress Rehearsal Data 
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ABSTRACT 


The present article discusses a model-based approach towards adjustment of the 1988 Census Dress 
Rehearsal Data collected from test sites in Missouri. The primary objective is to develop procedures that 
can be used to model data from the 1990 Census Post Enumeration Survey in April, 1991 and smooth 
survey-based estimates of the adjustment factors. We have proposed in this paper hierarchical Bayes (HB) 
and empirical Bayes (EB) procedures which meet this objective. The resulting estimators seem to improve 
consistently on the estimators of the adjustment factors based on dual system estimation (DSE) as well 
as the smoothed regression estimators. 


KEY WORDS: Post Enumeration Survey; Adjustment factors; Dual system estimation; Hierarchical 
Bayes; Empirical Bayes; Variance components; EBLUP’s; Regression estimates; 
Standard errors. 


1. INTRODUCTION 


The present article discusses a model-based approach towards adjustment of the 1988 Census 
dress rehearsal data collected from test sites in Missouri. The main objective behind this exercise 
is to develop procedures that can be used to model data from the 1990 Census Post Enumeration 
Survey (PES) in April, 1991, and smooth survey-based estimates of the so-called ‘‘raw adjust- 
ment factors’’. These raw adjustment factors which are ratios of estimates of the unknown 
total population to the corresponding 1990 Census count, are computed at various levels of 
aggregation (geographic areas such as cities, suburbs, efc.) crossed by various demographic 
categories (such as age, sex, race, etc.). The cross-classified categories are called poststrata. 

Before proceeding further, a brief historical anecdote is in order. Adjustment of 1980 
decennial census counts in the United States has been a topic of heated debate for nearly a 
decade. Despite the intensive efforts and the massive expenditure incurred by the U.S. Bureau 
of the Census to achieve near-complete coverage in the 1980 Census, there have been many 
lawsuits against the Bureau by individual states and cities demanding revision of the reported 
counts. In one such instance of litigation, by now well-publicized to the Statistics community 
in the articles of Ericksen and Kadane (1985) and Freedman and Navidi (1986), New York City 
among others sued the Census Bureau, and many reputed statisticians appeared as expert 
witnesses on either side. In particular Ericksen and Kadane appeared on the plaintiff’s side, 
and proposed a model-based approach towards the adjustment of census counts. They 
advocated shrinking the adjustment factors calculated on the basis of the PES data towards 
some suitable regression model. This approach documented in Ericksen and Kadane (1985) 
is similar to the one considered in Fay and Herriot (1979) or Morris (1983). Despite criticism 
of the Ericksen-Kadane approach by some statisticians (most severely by Freedman and 
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Navidi (1986)), most people recognize the importance of the model-based approach for adjust- 
ment. Indeed, in this article, barring a few differences in the assumptions, to be pointed out 
later in section 2, we use the Fay-Herriot or the Ericksen-Kadane model for the analysis of 
the 1988 Missouri Dress Rehearsal data. A different model-based approach which does not 
include co-variates is given in Cressie (1989). 

A good description of the PES conducted as part of the 1988 Missouri Dress Rehearsal can 
be found in Childers and Hogan (1990). Hogan and Wolter (1988) discuss the categories of 
error that occur ina PES and a means of their evaluation. Basically, the PES design consists 
of a single stage stratified sample of blocks and dual system estimation of the number of persons 
by poststrata. 

In the present article, we begin at the point where a set of estimated raw adjustment factors 
and their covariances from the PES are available for modelling based on the 1988 Census Dress 
Rehearsal Data from the Missouri test sites. It is also assumed that a set of possible explanatory 
variables defined at the poststrata level and to be used in regression are also available. There are 
two geographic areas under consideration: the city of St. Louis which is a large central city, and 
Eas. Central Missouri, which is a collection of areas of moderate population size. In defining 
the poststrata in St. Louis, persons were classified into the following demographic categories: 
(i) race: white non-hispanic and others, (ii) owners and non-owners (renters) of dwellings, 
(iii) sex: male and female, (iv) age groups: 0-9, 10-19, 20-29, 30-44, 45-64 and 65+. This led 
toatotalof2 x 2 x 2 xX 6 = 48 adjustment factors for St. Louis. In East Central Missouri, 
the sex and the age-group categories remained the same as in St. Louis, but instead of (i) and 
(ii), anew category (i)’ classifying persons as (a) White non-Hispanic in Tape Address Register 
(TAR) areas, (b) White non-Hispanic in non-TAR areas, and (c) others in all areas were 
introduced. For East Central Missouri, a total of 3 x 2 x 6 = 36 adjustment factors were 
calculated. Thus, a total of 84 adjustment factors were used for modelling. Within each area, 
estimated adjustment factors were correlated due to the use of a block cluster sampling scheme. 
This led to a block-diagonal sample covariance matrix of the adjustment factors of dimensions 
48 x 48 and 36 x 36 corresponding to St. Louis and East Central Missouri, respectively. 

In Section 2 of this article, we describe a general model-based method for obtaining smoothed 
adjustment factors, and the associated standard errors. Both the hierarchical and empirical 
Bayes methods are used. The EB method can also be regarded as a variance components method 
(see for example Harville (1985)). The formulas for posterior standard errors associated with 
the HB estimators are also provided. We may point out here that an EB method when employed 
naively can lead to serious underestimates of the associated standard errors. This is due to the 
fact that a naive EB method does not take into account the uncertainty due to estimation of 
the unknown variance components. However, Kackar and Harville (1984), and Prasad and 
Rao (1990) have suggested interesting approximations to the estimated mean squared errors 
(MSB’s) of the EB estimators. Following their principle, we have derived formulas for the 
estimated MSE’s in the present context. We have also pointed out in this section how some 
(though not all) of the criticisms levelled against the Ericksen-Kadane (1985) procedure by 
Freedman and Navidi (1986) can be avoided in the present context. 

In Section 3, we have analyzed the actual data. The sample estimates, the HB estimates, 
the EB estimates and the regression estimates of the adjustment factors are all provided. Also, 
the associated standard errors are given. Both the HB method and the EB methods which take 
into account the uncertainty due to unknown prior parameters stand on par in their perfor- 
mance, and enjoy a clear-cut superiority over the raw estimates as well as the regression estimates 
in reducing the estimated standard errors. 


Finally, some of the technical details of this paper are given in the Appendix. 
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2. HB AND EB ESTIMATION 


This section describes the general HB and EB estimation procedures for certain hierarchical 
models. The specific application to estimation of adjustment factors is considered in Section 3. 


The following hierarchical model is considered: 
I. FOB, o” ~ N(O, V), where Visa known m x m positive definite matrix; 
lige 6 ||B207— NCB, cl): 


III. § and o” are marginally independent with B uniform (R”) and o” uniform (0,0). 
The HB analysis is based on I-III. In the absence of precise prior information on 8 and o”, 
we prefer the use of diffuse priors in III. We also analyzed the data with the prior pdf of o” 


proportional to 0~” on (0, ). The results were quite similar and are not reported. The 
following theorem is proved. 


Theorem 1. Consider the model given in (I) - (II). Write£ = V + o* I. Supposem = p + 3. 
Then (i) the conditional pdf of © given o* and Y = y is N(GV—1y,G), where 


G=V-—vxu-ly + Yi XG Dae Xe XB; il) 
(ii) the conditional pdf of 0” given Y = y is 
F(a™| Vy |e eX Te Xe exp — 1/2 y Fy), (2.2) 


where 


| Le Lp XX eX) XD (2.3) 


The proof of the theorem is deferred to the appendix. Using formulas for conditional expec- 
tations and variances, one then gets 


E(®|y) = E[E(O| 0’, y) | y] = (E(GV""| y)) »; (2.4) 
V(O|y) = V[E(O|o*, y)|y] + [VO] o*, »)|y] = ViGV-'yly) + (Gly). (2.5) 


Using (2.2) and (2.3), one obtains E(@ | y) and V(@| y) from (2.4) and (2.5) via numerical 
integration. 


The calculations involved in (2.1) - (2.3) can be somewhat simplified when one uses the 
spectral decomposition theorem for V. Thus, V = PDP’, where D = Diag (did «5 dyed, 
being the eigenvalues of V, and P = (& ..., &»), €; being the corresponding orthonormal 
eigenvectors. Using the orthogonality of P, one now gets 
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m 
[EZ] = [or #p[ ST]? + 4); 


i=l 
ate) of Cad es nal 8a eile 
XTE-1X = (PTX)"(07I + D)~'(P™X); 
Fo Piet Dy Pl Pio exe 
[CPOE otters PAX)| 5 (PIX) (atlee Day « 


The actual numerical integration over 0? which needs evaluation of the integrand at different 
values of 0”, is somewhat simplified since P and X are known and o* J + Dis a diagonal 
matrix. 

Next we consider EB estimation. Then, one does not use III. First a Bayes estimator, i.e. 
the posterior mean of @ is obtained from I and II assuming 6 and o” to be known. This 
estimator is given by 


A 


6” = E(0| Y, B, o”) 
EVM h oe hy sn Virg Vemie eX) 
= X-1(0?¥ + VXB). (2.6) 
The corresponding posterior variance is given by 
VOY! Fy Syen yee) (himat laos Ue NAY Bee 


However, in practice, 8 and o” are unknown, and are estimated via the maximum 
likelihood method from the marginal distribution of Y which is N(XB, L). These MLE’s are 
denoted by 8 and 6”, where 8B = (X7Z~1X) ~1X7E-!y, Y = V 4+ 671. Substituting such 
estimators of £, o* and B in (2.6), an EB estimator of 0 is found as 


A 


6FB — r'(@’Y af VXB) = XB + iy (yY = Xp) (2.7) 


The estimator given in (2.7) is also obtainable as an estimated best linear unbiased predictor 
(EBLUP). First assume that 0” is known, and find the BLUP OP1UP = x8 + o’E~!(¥ — XB) 
of @ where 8 = (X7Z~1X) ~1X7E~1Y. Next estimate o? by 6’, its MLE and correspondingly 
by L. Substitution of 67, and ¥ in place of o” and Z in 6®¥P results in the EBLUP 6£8. 


A naive EB estimator of the variance matrix of 6©® is V — VE~-'!V. This is a gross 
underestimation of the variance matrix since uncertainty due to estimation of B and o7 
is not taken into account. If 07 is assumed known, and 8 is assigned a uniform prior on 
R?(m = p + 3), then the HB estimator of @ is the same as 08"? and the posterior variance 
matrix is then M = V — VEU!V + VOTLX(XTE 1X) ~1X7TZ-!V. This implies imme- 
diately that E[(O®{UP — @) (6@BLUP _ @)7] = M, where expectation is taken over the 
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joint distribution of Y and © given in I and II. Thus, in the Bayesian language, VE~! 
X (X7Z~1X)~'X7E~!V can be interpreted as the excess in the posterior variability due to 
the uncertainty involved in 6, while using the classical terminology, the same phenomenon can 
be interpreted as the excess in the MSE due to the same uncertainty. 

We have the additional problem of tackling unknown o”. The Bayesian method enables us 
to find the posterior distribution of 0” given Y = y, while even without introducing a prior 
for 9, it is still possible to find an approximation to the MSE of 6** by adapting an argument 
of Kackar and Harville (1984) or Prasad and Rao (1990). 

The necessary theorem whose proof is deferred to the Appendix is given below. 


Theorem 2. An approximate estimate of MSE of 68 js given by 


MSE (0**) Slee A Wry WA aig [2¢tré-?) -1), (2.8) 
where 


K=z-!- EE Re a9 AED: Ef see T (2.9) 


The third term in the right hand side of (2.8) can be interpreted as the excess in the mean squared 
error due to uncertainty in estimating 0”. A general decomposition of the prediction error is 
given in Harville (1985). 

Although the posterior variances V( | y) associated with the HB estimator 68 of © and 
the estimated MSE of the EB estimator 6 of © are motivated from two distinct inferential 
philosophies, one common thread tying the two is that they both attempt to incorporate the 
uncertainty due to estimation of the model variance. For a better understanding of this, note 
thatrwritine. Ky = Li hoa ad X( XLS shy AX Ea, 


E[V(®| 0°, y)] = G = V — VKV (2.10) 


and E(G| y) isapproximated by V — VKV which is one of the two terms given in (2.8). Also, 
E(0| o’,y) = GV~'y, and it can be shown after some simplification that GV~! = I — VK. 
Thus, V(GV-'y| y) = VV(K |y)V, and V(K|y) is apparently approximated by 
Ve [2 (tr ~2)-1). However, as evidenced later in the numerical calculations of Section oF 
MSE approximation of 6®® need not match V(0| y) perfectly. 

In Ericksen and Kadane (1985) one assumption involved was that of known o”. Freedman 
and Navidi (1986) insisted on estimation of o*, and we have in Theorems 1 and 2 accounted 
for this source of uncertainty both in a Bayesian and frequentist way. It should be noted that 
unlike previous work that addressed the estimation of net undercount of total population at 
the city and balance of state level, our interests lay in the estimation of adjustment factors at 
finer levels of detail. Operationally, adjustment at the finer levels allows for considerable savings 
in terms of time and computer costs as census files need to be used only once. Adjustment 
models using higher levels of geography would require several passes through the census data 
because they would require a method of distributing the undercount to lower levels of 
geography. Finally, correlation in the error structure allows the possibility of a non-diagonal 
V, another important generalization of the Fay-Herriot (1979) or Ericksen-Kadane (1985) 
model. Thus, the Freedman-Navidi criticism of lack of correlation across estimated adjustment 
factors does not hold against the present set up. The remaining main criticism of assuming 
the components of V to be known, whereas in reality these are sample based estimates, is yet 
to be resolved. Efforts are now being made to model the components of V as a function of 
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variables such as the number of sample persons, the initial regression predictor, efc. It is hoped 
that such models will stabilize the estimated variances by reducing their variance. 

Along with the HB and EB estimators of 0, there are also the regression estimators given 
by ORES = X(X7E-1X) ~-1X7E~1Y. The associated variance-covariance matrix is given by 
Myeid?(M bah + Bail = Dy where M4 pSeX XGA) ah. 


3. DATA ANALYSIS 


Let Y; = DSE;/Census, = adjustment factor i,i = 1, ...,84,and Y = (Y;, ..., Yea)’. 
The set of explanatory variables_X is quite large when all possible interactions are considered. 
To simplify the analysis, experts at the Census Bureau were consulted and a reduced set of 
22 potential explanatory variables were considered for modelling purposes. (See Huang et al. 
1991). The number of potential explanatory variables was also limited by the capability of the 
computer. The present model was selected using a best subset regression procedure with 
minimum Mallows’ C,, as the criterion over a set of 22 possible explanatory variables. Because 
the computer software required the input data to be in the ordinary least squares situation, 
we transformed the dependent and explanatory variables in the usual manner. Also, because 
o” is unknown, an interative procedure was used. 

As an aside, in selecting explanatory variables in the modelling process of adjustment factors 
for the 1990 Census, a slightly different procedure was used. In 1990, several explanatory 
variables were forced into the model and a best subset procedure was used to select additional 
explanatory variables. The change in procedure was made to counteract the potential for 
understating 07. (See Isaki et a/. 1991). 

The X matrix obtained via best subsets regression is of the form X = (1g4, Xz, X3, X4, Xs, 
X6, X7, Xg, X9, X19). All of the explanatory variables in X are obtained from the 1988 Dress 
Rehearsal Census and defined at the poststrata level, the unit of analysis. 14 is a unit vector; 
X> is the indicator variable for St. Louis; X3 is the indicator variable for renters or is the 
proportion of renters for the East Central Missouri poststrata; X, through X; are indicator 
variables for age groups 0-9, 10-19, 20-29 and 30-44, respectively; Xx is an indicator or 
proportion variable for males aged 20-64 that rent; X is an indicator variable for other males 
aged 20-64; and Xj is an indicator variable for other persons in St. Louis. 

Using the above design matrix, we obtained 8 = (.9812, —.0271, .0485, .0699, .0695, .0533, 
.0386, .0628, .0475, .0778)7 and 67 = .000574. The EB’s or the EBLUP’s and the associated 
approximate standard errors can now be computed using formulas derived in Section 2. For 
consistency, the HB analysis was also performed with the same X matrix (we do not require 
B or 6? for that analysis). 

In Figures 1 and 2 we plot the estimated adjustment factors and standard errors by poststrata. 
The first 12 poststrata refer to white non-Hispanic non-owners in St. Louis; poststrata 13-24 
refer to all other non-owners in St. Louis; poststrata 25-36 refer to white non-Hispanic owners 
in St. Louis and poststrata 37-48 refer to all other owners in St. Louis. Poststrata 49-60 refer 
to white non-Hispanic persons in Tape Address Register (TAR) areas in East Central Missouri; 
poststrata 61-72 refer to white non-Hispanic persons in non-TAR areas in East Central Missouri; 
poststrata 73-84 refer to all other persons in East Central Missouri. 

Within each group of 12 poststrata, the first six refer to males by age 0-9, 10-19, 20-29, 30-44, 
45-64 and 65+. We note in Figure | that the raw adjustment factors for the other group tend 
to be higher than those for the white non-Hispanic except for TAR area in East Central 
Missouri. The same observation nearly holds in Figure 2 concerning the raw standard errors. 
In Figure 3 a plot of the estimated standard errors versus the adjustment factors is provided. 
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Figure 3. SE of Adjustment Factors by Raw Adjustment Factors. 
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Figures 1 to 3 lead to several interesting conclusions. 


(1) For every stratum, the estimated standard errors of the HB and the EB estimators of the 
adjustment factors are much smaller than the standard errors of the raw adjustment factors 
when compared to the unadjusted DSE’s. 


(2) The EB estimators improve on the regression estimators for all the 84 strata by providing 
reduced estimated standard errors. Although the HB estimators do not improve on the 
regression estimators for all the strata, the improvement is substantial for most of the 
strata. 


(3) The data plots demonstrate that the difference between the point estimates OFBs and 
6} Bs is quite small. Indeed, the percentage difference is always less than (and most often 
far less than) 1%. 


(4) The posterior standard errors associated with the HB estimates (s/!®) are always bigger 
than the approximate MSE’s of the EB estimates (sF®). As discussed earlier, the two 
need not be the same. It is our feeling that the approximate standard errors of the EB 
estimates are often slight underestimates. However, a comparison of sF® and sF® reveals 
that a naive EB procedure (with associated estimated standard errors s®) can grossly 
underestimate the estimated standard errors by failing to incorporate uncertainty due to 
estimation of 0”. This deficiency is largely rectified by s*? which is based on second 
order approximations. 


At the time of revision of this article, adjustment of the 1990 Decennial Census was 
completed. The EB estimation procedure was used. Basically, most of the same steps followed 
in modelling the adjustment factors in the 1988 Dress Rehearsal Census were used. However, 
there were several differences. In 1990 adjustment, the estimated adjustment factors were 
modelled by each of four census regions and a special set for Indian reservations. The number 
of adjustment factors ranged from 12 for the Indian set to 456 in one of the regions. In addition, 
estimated variances of the raw adjustment factors were smoothed via regression models. 
Smoothing of the estimated variances tended to reduce large estimated variances and increase 
small estimated variances. The net effect was an increase in the contribution of the associated 
adjustment factors with large estimated variances to the EB estimates and vice versa. Other 
differences were that outlier detection procedures were used in both the variance and adjustment © 
factor smoothing. Finally, the EB estimates at the poststratum level were ratio adjusted to 
regional total population estimates derived from the raw adjustment factors. The ratio adjusted 
smoothed factors were then applied to related census population counts at the census block 
level. The results were then integer rounded by collection of blocks in such a manner that each 
cell within a block is rounded up or down to an integer and that control totals are off by at 
most one person. 


The procedures used to adjust the 1990 Census counts were pre-specified and the entire oper- 
ation was conducted under a very tight time schedule. The Bureau of the Census recommended 
that the 1990 Census adjusted counts be used. A special panel selected by the Secretary of 
Commerce was evenly divided in this issue. Upon weighing the evidence, the Secretary decided 
against using the adjusted counts. The issue is now subject to litigation. A current issue is the 
possible use of adjusted counts for use in postcensal estimation. Research in obtaining better 
adjusted counts for use in postcensal estimation is currently underway. 
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APPENDIX - PROOFS OF THE THEOREMS 


Proof of Theorem 1. We provide only an outline of the proof. The details appear in Datta 
et al. (1991). The joint (improper) pdf of Y, ©, 6 and o” is given by: 


f(y,8,8,0°)  exp[—1/2(y — @)™V—!(y — O)]a-™ exp[—1/(20°)||© — XB], (A.1) 


where ||-|| denotes the Euclidean norm. Writing Py = X(X7X) ~1X7, || © — XB ||? = 
[= (X7X) 1X70) 7(X7X) ss (X7X) 1X70] + O'(1 — P.)O. 


Now, integrating with respect to B in (A.1), it follows that the joint improper pdf of Y, 0 
and o” is 


S(¥,0,07) & 0"? exp[ — 1/2(y — ®)7V-!(y — 0) — 1/(20°)O07(I — P,)@].  (A.2) 
Next writing E = V-! + 9 -*(7 — P,), it follows after some simplifications that 
(y — 0)'V-l(_y — ©) + 0 7 OU — P,)O = 
(Oj Bey) £(0-— bal Vay) (= VR IV ly, (A.3) 
Hence, the posterior distribution of © given o* and Y = y is N(E~!V~1y, E~!). Using the 
familiar matrix inversion formula (A + BDB")~! = A-!— 4-!B(D7! + BTA~1B) 7} 
B‘A ~! (see for example Exercise 2.9, p. 33 of Rao (1973)), one gets E~! = G. This completes 
the proof of the first part of the Theorem. Next, using (A.3) and integrating with respect to 
0 in (A.2), one gets the joint (improper) pdf of Y and o? is 
Shy 0a) ono meeP! May | 2 exp[— (1/2)y7(V~? — CE ay). (A.4) 
Using Exercise 2.4, p. 32 of Rao (1973), it follows that 
el foe P.) |= [A lrelio XX) 
which on simplification reduces to 
mei tea Xa Geto Vy lx |i) MEX (ce) Tio | (XTEAX|S (A, 5) 
Also, after some calculations, it follows that 
V'i_ yop ly-le= fF, (A.6) 


The proof of part (ii) of Theorem 1 follows now from (A.4) - (A.6) and noting that 
f(o? |y)« f (07, y). Note, however that the posterior pdf of 0? given Y = y is proper. 
Proof of Theorem 2. Once again, only a sketch of the proof is given. The details are available 
in Datta et al. (1991). 


Recall 
B =-(X' Stk) “Oxi esty. 
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Define, 

6 = XB + o* D1(¥ — XB). 
Now, observe that (i) 6 is the best unbiased predictor of @ (due to normality) for every fixed 
o*, and (ii) E(6®® — 6) = Osince 6” is the MLE of 0? (cf Kackar and Harville (1984)). Now 
using Lemma 3.3.1 of Datta (1990), 68 — 6 is uncorrelated with 6. Hence, 

BOP B)(O™ Oye (= 

13) Suse — 6)(68* — 6)7) 1 E[(0 — 6)(6 — Oo. (A.7) 


Next, write 68 = Xp 0%" DS} CY —2¥8) Then standard arguments give 
B/(O —0) (O'= 0) SE) (07-67) (0 Ory | BI 29O) (6) =O) Teer) 
Our previous calculations yield 


E[(6° — 6)(68 — 6)7| Sapo (A.9) 
Further, 


E[(0 — 6°)(6 — 6°)7| ie pits eal G yelp. @ bee a (A.10) 


Finally, write 6 = g(o’) and O® = g(6*). Using first order Taylor approximation, one gets 


(A.11) 


E[(O™ — 6)(6F - 6)7] = E| (@ _ gy? (0) al 


do? do? 


Since g(o”7) = Y — VE-'![¥ — X(X7T=7!X)-'XTZ~'Y], using matrix differentiation, 
techniques, one gets 


& = A pia amie SO.) 0? A NF, ain ad B Sita fa (A.12) 
0 
T 
B| oA = VKX'E|(¥Y — XB)(¥ — XB)"\Z~'KV. (A.13) 
0 0 


But, simple algebra gives 


E[(Y ae Gd XB)" Ste AIGA Os A Pk Pe (A.14) 
Hence, from (A.13), 


Eh eae =| = VK°V. (A.15) 
oO 


Survey Methodology, June 1992 107 


Using, one more approximation, it follows from (A.11) and (A.15) that 
E[(O*® — 6) (6% — 6)" + E(é* — 0*)*VK°V. (A.16) 
To estimate E(é” — 0”)? = MSE(6?), we proceed as follows. 


Since Y ~ N(X8, Z), write the likelihood function as 


TAA a a exp[—1/2(Y — X8)7Z7!(¥ — XB). (A.17) 
Hence, 
dlogL d d 
Mert l/ 2-5 log | Sih aor ten Cay sh Yk A.18 
re i: g|z| Jo? [( 8) ( By]; ¢ ) 
d*logL de d? 
= — 1/2 ——, log| Z| — 1/2 —.— [(Y — XB)?3E-'!(Y — XB)]. (A.19 
d(o*)? d( o)? g | | d(o 2y2 | ( B) ( B)| ( ) 
As before, denote by d;, ..., d,, the eigenvalues of V. 


Then, log| Z| = 2%, log(o* + d;). Hence 
a loge] 28D yp (ond) 44 S95 te (Dr? (A.20) 
oo) an 1 ( in (2 °“). 


Using (A.20) and matrix differentiation, it follows from (A.19) that 


d*logL e i 
Thus, 
d*logL ¥ ; 
[= renal eer i Ne tr(z 2 + tr( ) sa 1/2 tr(Z a 


Approximating E[(é? — o7)*] by 
pole d*logL a 
d(a*)* 


justifiable by the asymptotic theory of maximum likelihood, one gets, from (A.16), 


E/(e™ — 6)(6" — 6)7| ee ee ey fe (A.22) 
Combining (A.7) - (A.10) and (A.22), one gets 


MSE (68) = y—vy-'y + VEX (XTE 1X) xT V+ VKOV[2 (tre ~?) ~"(A.23) 
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Substitution of for Z yields the approximation given in (2.8). This completes the proof of 
Theorem 2. 
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of Population Totals 
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ABSTRACT 


The Population Estimates Program of Statistics Canada has traditionally been benchmarked to the most 
recent census, with no allowance for census coverage error. Because of a significant increase in the level 
of undercoverage in the 1986 Census, however, Statistics Canada is considering the possibility of adjusting 
the base population of the estimates program for net census undercoverage. This paper develops and 
compares four estimators of such a base population: the unadjusted census counts, the adjusted census 
counts, a preliminary test estimator, and a composite estimator. A generalization of previously-proposed 
risk functions, known as the Weighted Mean Square Error (WMSE), is used as the basis of comparison. 
The WMSE applies not only to population totals, but to functions of population totals such as popula- 
tion shares and growth rates between censuses. The use of the WMSE to develop and evaluate small- 
area estimators in the context of census adjustment is also described. 


KEY WORDS: Census adjustment; Undercoverage; Small area estimation. 


1. INTRODUCTION 


The Population Estimates Program of Statistics Canada provides a wide variety of detailed 
information about the characteristics and distribution of the Canadian population during the 
five-year period between each census. Intercensal estimates of population have many impor- 
tant uses, including the calculation of billions of dollars of transfer payments from the federal 
to provincial governments, the estimation of important demographic statistics such as birth 
and mortality rates, the planning of future levels of immigration, and the weighting of current 
population surveys such as the monthly Labour Force Survey. 

Traditionally, the estimates program is based on the most recent census, with no allowance 
for coverage error. In the 1986 Census, however, undercoverage increased significantly compared 
to previous censuses, and continued to be distributed unevenly across geographic and demographic 
groups. This caused considerable disruption to the estimates program and to the many other 
programs which use population estimates. As a result, a project was initiated in early 1989 to 
investigate whether, and if so how, the population estimates in the post-1991 Census period 
should be adjusted for estimated census coverage error. The research described in this paper was 
conducted as part of this project. For a more general description of the project, see Royce (1992). 

It should be noted that only the population estimates would be affected by this adjustment. 
The 1991 Census data will be published with no adjustment for undercoverage, other than the 
small adjustments that have traditionally been made to correct for underenumeration of tem- 
porary residents and for persons missed because their dwelling was misclassified as vacant. 
From the technical point of view, however, the question is quite similar to the issue of census 
adjustment that has been of interest to many statistical agencies in recent years. 

Two key questions in the adjustment issue are the degree to which census counts are improved 
by adjustment, and which adjustment methods are best. In this paper, we compare the accuracy 
of several different estimators of a set of population totals, using a weighted mean square error 
as our criterion. 
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Section 2 introduces the topic by considering the simple case of a single population total. 
We derive and compare the Mean Square Errors of four possible estimators: the unadjusted 
census count, the adjusted census count, a preliminary test estimator, and a composite 
estimator. Section 3 extends the results to multiple population totals and to functions of popula- 
tion totals, such as population shares and growth rates. In Section 4, we consider methods for 
small-area estimation, specifically, the use of synthetic estimation and a special case of synthetic 
estimation known as across-the-board adjustment. Section 5 concludes with a description of 
areas for further research. 

In developing the estimators described in this paper, two assumptions were made. First, 
adjustment must result in estimates that are consistent across all geographic and demographic 
levels, as well as across time. Users consider it to be essential that parts add up to totals, and 
that there be no major breaks in the time series of estimates. Second, adjustment will be based 
on the combined results of Statistics Canada’s two coverage measurement studies: the Reverse 
Record Check, which measures gross undercoverage, and the Overcoverage Study, which 
measures gross overcoverage. Both studies are subject to sampling errors and non-sampling 
errers. 


2. SINGLE POPULATION TOTAL 


We first describe and compare four estimators for the case of a single population total. In 
comparing the estimators, we use the Mean Square Error (MSE) as our criterion. 


Let: Y be the known census count; 
T be the unknown true population total to be estimated; 
U be the true net undercoverage, 7.e., U = T — Y; 
U bean estimate of U from the coverage measurement studies; 
o° be the variance of U; and 
R be the relative bias of U, i.e. R = E(U)/U — 1. 


In the case of all four estimators, our estimate of T can be written as the census count plus 
some estimate of U. Thus, the MSE of our estimate of 7 will be the same as that of the 
corresponding estimate of U. The MSEs (and the WMSEs in later sections) are taken over 
hypothetical repetitions of the coverage measurement studies, treating the Census counts as 
fixed quantities. 
2.1 Unadjusted Census Estimator 

The unadjusted census estimate of U is zero. It has a bias equal to — U and zero variance. 
Therefore MSE(U°) = U’. 
2.2 Adjusted Census Estimator 

The adjusted census estimator of U is U. It has a bias of UR and a variance equal to o?. 
Thus MSE(U“) = o? + U?R?. 
2.3. Preliminary Test Estimator 

A comparison of the MSEs of the previous two estimators suggests that we would use the 


adjusted census count in preference to the unadjusted census count whenever 


ota) UAC} iRA (1) 
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Although the parameters in this inequality are unknown, they can (with the exception of R) 
be estimated from the coverage measurement studies. This suggests the possibility of using these 
estimates to develop a statistical test of the hypothesis that the inequality holds. The result of 
the test is then used to choose which estimator to use (thus the term preliminary test, or pretest, 
estimator). 

Specifically, assume that | R| < 1, (obviously necessary for (1) to hold) and U ~ N 
(U(1 + R), 0), where o” is known. Then U?/o? has a non-central X{) distribution with 
non-centrality paramater \ = U?(1 + R)?/207. The null hypothesis Ho : 02 = U? (1 — R?) 
is equivalent to the hypothesis Hy: \ < (1 + R)/2(1 — R). One approach, therefore, 
could be to adjust whenever U*/o” > c, where the critical value c > 0 is chosen so that 


a= Pr eae > | ; (2) 


where a is the significance level of the test. This is a special case of a more general test suggested 
by Toro-Vizcorrondo and Wallace (1968). 

Note that U?/o? is the inverse of the square of the estimated coefficient of variation (CV) 
of U. Thus, the criterion for adjustment can be interpreted in terms of a requirement to have 
a sufficiently small (in absolute value) CV. 

In practice, we would have to substitute some prior estimate of the relative bias, sayr,for R 
in (2). The sensitivity of c to various values of R is examined in Royce (1991) for the case of 
a one-sided test (a normal distribution was used instead of a x’ in this case). For example, with 
a significance level of 2.5%, it was found that the critical CV was only reduced from 33.8% 
to 27.1% even when the relative bias was as much as 50%. 

If o is not known but an estimate ¢ is available, then a similar test can still be constructed 
by assuming that 

v6" 


2 
a ee ae , 3 
ap X() (3) 


independent of U. This leads to a test based on a non-central F distribution. Further details 
on the construction of such tests are given in Judge and Bock (1978). 

In order to determine the MSE of the preliminary test estimator, we note that it can be written 
as UP = JU where 


U2 
0 

U2 

oO 


When o” is known, the MSE of this estimator can be shown to be (see, for example, Judge 
and Bock 1978, p. 72) 


MSE(U") = o* + UR? + (2U*(1 + R) — 9”)Pr{x,) = Gq — 


U*(1 + R)*Pr{x{s,x) < ch. (5) 
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Note that asc — ©, i.e. as the chance of adjustment goes to zero, the MSE approaches U”, 
the MSE of the unadjusted census. Similarly, asc — 0, i.e. as the chance of adjustment goes 
to certainty, the MSE approaches o* + U?R?, the MSE of the adjusted census estimator. 
Thus, the two previous approaches of adjustment or no adjustment can be interpreted as 
extreme cases of the pretest estimator procedure. 

Figure 1 shows MSE/o? for the preliminary test estimator as a function of U*/o?, for 
various values of c, in the unbiased case (R = r = 0). The MSEs/o? of the unadjusted 
census and the adjusted census are also shown. In all cases, the MSE of the preliminary test 
estimator starts out higher than that of the unadjusted census, crosses the MSE of the adjusted 
census, reaches a maximum, and then approaches the MSE of the adjusted census. As the value 
of c decreases and the level of significance a of the test therefore increases, the MSE of the 
preliminary test estimator approaches that of the adjusted census more quickly, but at the 
expense of being higher for small values of U*/o*. Thus, the performance of the preliminary 
test estimator over the range of possible values of U?/o* depends on the level of significance 
that is chosen for the test. 

Figures 2 and 3 show similar plots in the case where R = .5 and R = —.5 respectively 
(since we may feel we have no information on which to base an estimate of R, we have set 
r = 0). Again, the MSEs of the preliminary test estimators approach those of the adjusted 
census as U*/o” increases. With a positive bias the MSE of the preliminary test estimator 
approaches the MSE of the adjusted census more quickly than in the unbiased case, while for 
a negative bias the reverse is true. 

What is the ‘‘best’’ value of c for the test? Ideally, we would like to choose c so that the 
MSE of the preliminary test estimator is as close as possible to the minimum of the MSEs of 
the adjusted census and the unadjusted census. One approach, due to Sawa and Hiromatsu 
(1973) and extended by Brook (1976), is to minimize the maximum difference between the MSE 
of the preliminary test estimator and the minimum of the MSEs of the adjusted census and 
unadjusted census. For the unbiased case this criterion gives an optimal value of c of approx- 
imately 1.88. This corresponds to a critical CV (in absolute value) for the estimated under- 
coverage of 73%. The MSE of this estimator is shown in Figure 4. 

Judge and Bock (1978) also describe other approaches to choosing the optimal value of c, 
such as minimizing the average distance (rather than the maximum difference) and Bayesian 
approaches. 


2.4 Composite Estimator 


The preliminary test estimator was written as U? = JU, where J took on only the values 0 
or 1. Because of this inherent discontinuity, however, it has been shown that the preliminary 
test estimator is inadmissible (Cohen 1965). As an alternative, we might consider letting the 
multiplier of U take any value between 0 and 1. That is, instead of using the data to tell us 
whether to adjust, we use the data to tell us how much to adjust. This type of estimator has 
been suggested by Spencer (1980) and more recently by Andrews (1991). We define U® = aU 
where 0 < qa S 1. For a given alpha, this estimator has a MSE equal to 


MSE(aU) = a0? + U*(a(1 + R) — 1)’, (6) 


which is minimized when 


U*(1-+ Ry 
a= 7 
(1 + Rite” at.U701 EE R)?) 


(7) 
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Figure 1 Comparison of MSEs 
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Figure 2 Comparison of MSEs 
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Figure 3 Comparison of MSEs 
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Figure 4 Comparison of MSEs 
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Figure 7 Comparison of MSEs 
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Figure 8 WMSEs for Totals 
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Figure9 WMSEs for Shares 
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If o is assumed known, then a possible estimator of a is 


Ape 
wort a Se a eee (8) 
(1-7) ("4 0) 
and thus 
a [P 
Oi ee (9) 


(lh) (2a. 


The approximate MSE of this estimator can be found using a Taylor series approximation. 
Letting 


U? 
h(U,o*) = ee (10) 
(1 + r)(o* + U*) 
we get (dropping terms higher than those involving the first derivative) 
oY 3 Ped CSA ENN ve eer mcm 
MSE(U%) = (hA(U,0*°) — U)* + STE Ts (U*R* + o”) 
dh(U,0? 
+ 2(h(U,o?) — U) ai UR. (11) 


This approximation can also be extended to the case where o is unknown by making the 
assumption given in (3). The MSE is then increased by the additional term 


2\\2 4 
(eee ‘) 20° (12) 


do? py 


Figures 5, 6 and 7 show the MSE of the composite estimator as a function of U*/o’, as well 
as the MSEs of the unadjusted census, adjusted census, and the optimal preliminary test 
estimator from Section 2.3. In the unbiased case (Figure 5) and the positive bias case (Figure 6), 
the composite estimator outperforms the optimal preliminary test estimator. When the bias 
is negative, however, (Figure 7) the MSE of the composite estimator can be much higher than 
any of the other estimators over a considerable portion of the range of U*/o?. 


3. MORE GENERAL ESTIMATORS 


In this section, we generalize the four estimators examined in Section 2 in two ways. First, 
instead of a single population total, we consider a vector of population totals, denoted as 
T = (Jj, To, ..., Tx). Second, we consider not only the population totals themselves, but 
also functions of the population totals, denoted by g(7T) = (g;(T), @2(7), .--, gx(T)) 
where in general K# N. Typical functions of interest include population shares, used in the 
transfer of funds from the federal to provincial governments, as well as growth rates between 
censuses, differences in growth rates among different provinces, and so on. 

In evaluating the overall accuracy of some estimate g ( 7) for g(7), we will make use of 
a loss function. The use of loss functions for evaluating the effects of census adjustment is 
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described in Fellegi (1980), Citro and Cohen (1985), Spencer (1986), and Wolter and Causey 
(1991) to name just a few. The specific loss function used in this paper is a generalization of 
previously-proposed loss functions for population totals and shares. Specifically, the risk 
(expected loss) of the estimator g(7*) is the Weighted Mean Square Error, defined as 


IK 
WMSE (g(7*)) = ef Dy wel(ee(P") — aD)? (13) 
k=1 


where wy, is a user-specified weight reflecting the importance of the k-th component of the loss 
function. 

Since g may be complex in practice, it is useful to work instead with an approximation to 
the WMSE derived by expanding ATED) in a Taylor series around T. This yields: 


N N 
WMSEg(7*) = De oD wi;[Cov(Us, Ut) + Bias(U*)Bias(U*)] (14) 
= 


where the weight w;; is given by 
K 
08% 98 
0 = wo. (15) 


(Note that the approximate WMSE can also be written as the expectation of the quadratic form 
(U* — U)’Q(U* — U) where w;; is the j-th element of .) 

This formulation conveniently splits each component of the risk function into two parts: 
a weight w;; that depends only on the w, and the function g, and the portion in square brackets 
which depends only on the particular estimator being used. 

While the choice of the w;, can be arbitrary, considerations of equity have often led to the 
choice w, = 1/T,. In the case of population totals and shares, for example, the risk function 
(14) then becomes equivalent to those proposed by Fellegi (1980) and also used by Wolter and 
Causey (1991), among others. Other choices for the weights that have been suggested in the 
literature include w, = 1/Y;,, we = 1/T,, and w; = 1. For further discussion on the merits 
of these various weightings, see the references cited above. Table 1 shows some examples of 
w,; for different functions. 

In the case of population growth rates, the first pair of subscripts on the omega refer to 
the population quantity of interest (e.g. province) while the second pair refer to the census at 
time 1 or time 2 respectively. The second subscript on the 7; also refer to the census at time 1 
OFiz. 

In the remainder of this section, we illustrate the use of the WMSE in developing and 
evaluating the unadjusted census, adjusted census, preliminary test estimator, and composite 
estimator. 


3.1 Unadjusted Census 
The WMSE of the unadjusted census is WMSE(U°) = Y ,w;U;U;. 


3.2 Adjusted Census 


The WMSE of the adjusted census is WMSE(U%) = Y ,w,[oj; + b;b;] where oj, = 
Cov(U;,U;) and b; = Bias(U;). 
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Table 1 
Examples of Weights w;; in the Approximate WMSE for Various Functions 
Function Wij 
Set of Population Totals W; = Wj 


Set of Population Shares oO; == ( a w Tt + wT? — 277.) 
k 


1 d : 


k 
2 
wiTin 
Set of Growth Rates Cm = 
Ti 
ae WlaTi2 _ - 
Be ed 
Ti 
2 
- a, wil ij 
U2 4 
Ti 
Oi le ail) & eee ok oe ye 0 | 


3.3. Preliminary Test Estimator 


As in Section 2.3, we would use the adjusted census in preference to the unadjusted census 
if the WMSE of the adjusted census is less than the WMSE of the unadjusted census, /.e., if 


iy 


Tests for this type of hypothesis were suggested by Fellegi (1980) for the specific cases of popula- 
tion totals and population shares, but the ideas generalize quite readily to any function g. The 
left hand side of the inequality (16) is estimated by D = Yj; w,|U;U; — 20;;| where the 
w;; are assumed to be known. (In practice the w;; are estimated by substituting either the census 
counts or the adjusted census counts in (13). Fellegi claimed that minor variations in the weights 
were unlikely to substantially change the test results.) It is then easy to show that E(D) = 
D + 2¥j; w b;(U; + b;). For the case of totals and shares, Fellegi presented arguments why 
it could be assumed that the second term was non-positive, /.e., ) j; w;b;(U; + b;) Ss 0 so 
that D would tend to underestimate D. Fellegi also derived an approximate variance for D. 
This, along with the assumption that D was normally distributed, permitted the construction 
of a test for the hypothesis given in (16). 
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Table 2 


z Values for Fellegi’s Tests for Adjustment of Provincial Population Totals and 
Shares, Reverse Record Check, 1976, 1981 and 1986 


Function 1976 1981 1986 
Totals 9.3 10.1 13.1 
Shares 3.1 1.8 es 


In the more general case, the approximate variance of D is given by Var(D) = 4 jai 
(¥ ij @jj@;-;U;U;). Anestimate of Var(D) can then be derived by substituting estimates of 
the U; and o;; in this formula. 


In the case of totals, for example, the test statistic (z value) is given by 


D Y; 
a  —— (17) 
Var (D) 026? 
y2 


where in this case the inverse of the census counts have been used as the weights. A similar 
expression can be derived for population shares. 

Table 2 shows the z values calculated for the censuses of 1976, 1981 and 1986 for provincial 
population totals and shares. The data come from the Reverse Record Checks conducted in 
these censuses. 

The case for adjusting population totals is much stronger than the case for adjusting shares, 
reflecting the fact that estimates of differences in undercoverage rates among provinces are 
less accurate than estimates of the undercoverage rates themselves. Further numerical results 
are given in Royce and Luc (1990). 


3.4 Composite Estimator 


A natural extension of the composite estimator of Section 2.4 would at first seem to be 
a,;U;. However the use of different amounts of adjustment for each value of i introduces 
problems of consistency. For example, it would imply that more adjustment should be done 
at the Canada level than at the province level, since the estimates of undercoverage at the 
province level will be less accurate than for the national level. If this were done, the provincial 
totals would not add up to the Canada total. 


In practice, therefore, we constrain ourselves to a single value of alpha, i.e. U* = aU, 
where again 0 < a < 1. The WMSE of this estimator is 


WMSE(U*) = YY wjjla* (oi + b)bj) + (a — 1)?U;U; + 2a(a — 1)U;bj|, (18) 
y 
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which is minimized when 


yy wij U;(U; + 5;) 
_— ij 2 
7 9 Wij [oi == (U; aia b;) (U; ae b;)| 


ty 


(19) 


04 


If, as was done in Section 3.3, we make the assumption that } j;w,;;b;(U; + 0;) = 0 then 
a lower bound for the optimal alpha is given by 


Nea (0, eae) CU toby) 
i 


aS ER i pee Ty oa Na 20 
VY wy loy + (Ui + B))(U; + 2) Y 


ar 
which we estimate by 
(21) 


assuming the w,; are known. In practice, as we did for the preliminary test estimator, we would 
estimate the w,; by substituting census counts or adjusted census counts in (15). 


In the case of population totals, for example, the estimated amount of adjustment is 
Of en ao SAA (22) 


where U; is the estimated undercoverage rate, i.e. U;/(Y; + U;), and 6? is its estimated 
variance. 


For shares, the amount of adjustment is given by 
Cp: SS rea eae ee (23) 


where U is the estimated undercoverage rate for the total population, i.e. ¥;U;/¥;(Y; + U;) 
and 6° is its estimated variance. The inverse of the adjusted census counts have been used as 
the weights in these two examples. 
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3.5 Numerical Comparisons 


In the case of a single population total, it was possible to derive exact or approximate 
formulae for the MSEs of the four estimators as a function of U?/o?, R, r and (in the case 
of the preliminary test estimator), the critical value of the test. Unfortunately, it has not yet 
been possible to derive similar expressions for the WMSEs of complex functions of a vector 
of population totals. 

In the case of the unadjusted census, adjusted census, and composite estimator, however, 
it is possible to estimate the WMSEs by substituting estimates of undercoverage and their 
estimated variances into equation (18) (if estimates of the bias terms are available they can be 
used, but in what follows we assume they are zero). For example, Figures 8 and 9 show, for 
the 1981 Census, the estimated ratio of the WMSE to the optimal WMSE, as a function of 
a, where the provinces are again the units indexed by i. The extremes of a = Oanda = 1 
correspond to the unadjusted and adjusted census counts respectively, while the minimum point 
on the curve corresponds to the optimal a. Figure 8 is for totals and Figure 9 is for shares. 
The optimum values of a were computed using formulae (22) and (23), 

In each case, the optimal degree of adjustment is close to 1.0, and results ina WMSE con- 
siderably lower than the WMSE corresponding to no adjustment (e.g. by a factor of almost 
70 for totals). The optimal degree of adjustment is less for shares than for totals, again reflec- 
ting the fact that estimates of differences in coverage rates between provinces are less accurate 
than the estimates of the rates themselves. It is also interesting to note that the WMSE for full 
adjustment is only slightly higher than that of the optimal degree of adjustment. This can have 
important practical significance, since it is much easier to explain a full adjustment to data 
users than to explain a partial adjustment. 


4. SMALL AREA ESTIMATION 


The previous two sections considered the case where direct estimates of undercoverage, and 
estimates of their variances, were available from the coverage measurement studies. This situa- 
tion applies, for example, for provinces, for some major Census Metropolitan Areas, and for 
broad demographic groups (e.g. age by sex, age by marital status) at the national level. However 
the Population Estimates Program produces estimates at very detailed levels, such as single 
years of age by sex by marital status for some 260 Census Divisions. Direct estimates of under- 
coverage generally do not exist at such levels. 

Nevertheless, the need to maintain consistency of the estimates requires that any adjustment 
made at a higher level be ‘‘carried down’’ to the detailed levels used by the estimates program. 
In this section, we consider the use of synthetic estimation for this purpose, and show how 
the WMSE can again be used to develop preliminary test estimators and composite estimators. 

The synthetic estimator is based on the assumption that net undercoverage is uniform within 
each of a number of ‘‘adjustment groups’’, indexed by a. The synthetic estimate is then given 
by 0S = yp gNiqUq Where jg = Yiq/ Y,. For example, the adjustment groups might correspond 
to age-sex groups, for which estimates of undercoverage U, are available at some higher level. 

A special case of the synthetic estimator arises when there is only one adjustment group. 
Wolter and Causey (1991) have called this the across-the-board estimator. It is defined as 
Uf7® = )\;U where \; = Y;/Y. WMSEs for the across-the-board and the synthetic estimator 
can be derived using equation (14). Since the w;; do not depend on the particular estimator 
used, only the portion in square brackets changes. Table 3 compares the estimators of U; and 
their covariance and bias terms for the census, adjusted census, across-the-board and synthetic 
estimators. 
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Table 3 


Examples of Covariance and Bias Terms in the Approximate WMSRE for Various Estimators 


Estimator CF Cov(U¥,U*) Bias (U*) 
Adjusted Census U; ay b; 
Across-the-Board \,U \jAjo" AU + b)ji— GU; 


Synthetic re Nig Gat Vel ey 


where b = Y ,b; and similarly b, is the bias of U,. 


4.1 Preliminary Test Estimators 


As was the case in Sections 2 and 3, the WMSE can be used to develop statistical tests to 
decide between two competing estimators. As an example, consider the situation where we wish 
to choose between the unadjusted census and the across-the-board estimator for population 
totals (shares are of course unchanged by across-the-board adjustment). On comparing the 
WMSEs of these two estimators, we find that we would use the across-the-board estimator 
in preference to the census counts if 


2TB 
2 2 uc, R>)ihd-oisese ss 24 
ora, UG | weak (24) 


where 
1 


Le 


B=1 (25) 


and 7; = 7;/T. This condition was given, in a different form, by Wolter and Causey (1991). 
Bis a measure of the heterogeneity of undercoverage; it is non-negative, and is equal to zero 
if and only if the undercoverage is completely uniform. 

Noting that this inequality is the same as (1) except for the additional term in square brackets, 
we can derive a test very similar to the test described in Section 2.3. The critical value of the 
coefficient of variation will depend on the chosen significance level and the relative bias as 
before, but will also depend on B/U, the ratio of the heterogeneity of undercoverage to the 
overall undercoverage rate. 

Royce (1991) showed that, in practice, the effect of this additional factor on the critical CV 
was likely to be negligible. Thus, if adjustment is justified at some higher level, then carrying 
down the adjustment to lower levels is almost certainly justified as well. Similar results were 
found in a simulation study reported by Wolter and Causey (1991). 
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4.2 Composite Estimators 


In Sections 2 and 3 we considered composite estimators where the two extremes were the 
unadjusted census and the adjusted census. With the addition of the synthetic and across-the- 
board estimators, the number of possible composite estimators increases considerably. For 
example, we might consider composite estimators involving the unadjusted census and the syn- 
thetic estimator, the adjusted census and the across-the-board estimator, the across-the-board 
estimator and the synthetic estimator, and so on. Consequently, we present below a method 
which can be used to derive a composite estimator involving any two estimators. 


Our general composite estimator is defined as Gievs aU; + (1 -—a) U, where Oo, and UO, 
are two estimators. The WMSE of this estimator is 


WMSE (g(7*)) = a*WMSE(g(7;)) + (1 — a)? WMSE(g(75)) 


+ 2a(1 — a) WMXPE(g(7,,g(7>)), (26) 


where 
WMXPE(g(71,2(15)) = Y) w[Cov(O,;,0;) + Bias (U,;)Bias (0,,)| (27) 
WT 


is defined to be the Weighted Mean Cross-Product Error of Bi T 7T,) and g( il T,). The WMSE 
of our composite estimator is minimized when 


WMSE (g(7>)) — WMXPE(g(7;),g(12)) 
~ WMSE(g(7;)) + WMSE(g(7)) — 2WMXPE(g(7;),g(7>)) 


(28) 


To obtain an estimate of a, we substitute estimates of the WMSEs and the WMXPE into the 
above. 


As an example of how this approach could be used, suppose a decision has been taken to 
adjust a provincial population total. To carry down the adjustment, we might consider using 
either across-the-board adjustment (i.e. adjust all sub-provincial quantities by the same factor), 
or a synthetic adjustment, where the adjustment is done separately within several age-sex 
groups. The across-the-board method has the advantage that it uses only the provincial estimate 
of undercoverage, which is likely to be more reliable than the estimates of undercoverage by 
age and sex at the province level. On the other hand, if undercoverage varies considerably among 
age and sex groups, and if the sub-provincial quantities indexed by i also differ in their age- 
sex composition, then the synthetic estimator may be better. 

If estimates of the U; are available from some source, then all covariance and bias com- 
ponents of the WMSEs and the WMXPE can be estimated (using formulae such as those in 
Table 3), and the optimum composite estimator involving the across the board and synthetic 
estimators can be estimated. Although for sub-provincial quantities the U; will not usually 
exist in practice, the method can be investigated at higher levels. For example we could use 
the provinces as the quantities indexed by i and use across-the-board and synthetic adjustment 
factors computed at the Canada level. A second possibility is to construct an artificial popula- 
tion (e.g. as in Shirm and Preston (1987) or Wolter and Causey (1991)) where the U; are 
assumed to be known. 
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5. FURTHER WORK 


The results presented in this paper represent only a start to the investigation and comparison 
of the performance of various estimators of a set of population totals. There are several areas 
where considerable work is yet required. 

First, further investigation of the WMSEs for the preliminary test and composite estimators 
in the more general cases described in Sections 3 and 4 is required. Although attempts to derive 
analytic expressions for these WMSEs have not yet been successful, the more general results 
for preliminary test estimators and Stein-rule estimators described by Judge and Bock (1978) 
may yet be found to apply. If so, this would help to answer questions such as: Can optimal 
critical values be found for the Fellegi-type preliminary test estimators of Sections 3.3 and 4.1? 
How does the WMSRE of the preliminary test estimator compare in practice to those of the 
other three estimators? 

Second, more work is needed to explore the sensitivity of the results to different weightings 
in the loss function. The results of Section 3 were based on the use of a weight equal to the 
inverse of the census count or the adjusted census count for each province. If the provinces 
had been weighted differently, the results would change. A more general weight we might want 
to consider is w, = Y, where y is some type of power parameter. The sensitivity of the results 
in Section 3 to various of values of y could then be studied. 

Finally, while the methods described in this paper provide a framework for developing and 
evaluating various estimators, the exact manner in which the methods will be applied has yet 
to be decided. Specific issues that must be resolved include: 


1. What is the relative importance of different types of functions such as totals, shares 
and growth rates? Different functions give rise to different results, but in the end a 
single estimator must be chosen in order to maintain consistency. 


2. At what geographic and demographic levels should these methods be applied? For 
example, should the preliminary test estimator or composite estimator described in 
Section 3 be applied at the province level, at the province by age group and sex level, 
or at even more detailed levels? The results obtained depend on the level of analysis 
used. 


3. Could we even consider composite estimators for ‘‘high profile’’ estimators such as 
the provincial population totals? It might be difficult to explain to users why the 
adjustments do not coincide with the published estimates of undercoverage. 


Because the resolution of issues such as these will require professional judgement, the deci- 
sion about whether to adjust (and how to adjust) cannot be an automatic one based on com- 
pletely pre-specified criteria. While the methods described in this paper can provide useful 
guidance, the final decision will require a careful balancing of the potential improvement in 
the accuracy of the estimates with consideration of how easily the methods can be communicated 
to and understood by users of the estimates program. 
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The Creation of a Residential Address Register for Coverage 
Improvement in the 1991 Canadian Census 


L. SWAIN, J.D. DREW, B. LAFRANCE and K. LANCE! 


ABSTRACT 


The Address Register is a frame of residential addresses for medium and large urban centres covered 
by Geography Division’s Area Master File (AMF) at Statistics Canada. For British Columbia, the Address 
Register was extended to include smaller urban population centres as well as some rural areas. The paper 
provides an historical overview of the project, its objective as a means of reducing undercoverage in the 
1991 Census of Canada, its sources and product, the methodology required for its initial production, 
the proposed post-censal evaluation and prospects for the future. 


KEY WORDS: Address Register; Census undercoverage; Geographical Information Systems (GIS). 


1. OBJECTIVE 


The concept of an Address Register at Statistics Canada dates back to the 1960s. Fellegi 
and Krotki (1967) first considered building one for the 1971 Census using administrative source 
files as the base. Their approach was mostly manual and yielded a very complete set of addresses 
with minimal undercoverage and overcoverage. In the mid-1970s (Booth 1976), the idea resur- 
faced in planning for the 1981 Census. This time the approach started with data capture of 
addresses from the previous Census and was augmented with information from Canada Post. 
In both cases, the generated address lists were being considered as a frame for a mail-out Census. 
However, costs of creation were high and would have needed offsetting reductions in other 
Census operations to be effective. In addition, the risks associated with changing the tradi- 
tional enumeration method were considered too great. As a result, the construction of an 
Address Register was suspended in each case. 

A renewed interest in the concept of an Address Register emerged from the International 
1991 Census Planning Conference (Royce 1986, 1987) in October 1985. This interest derived 
from the potential for automation of Fellegi and Krotki’s approach due to technological 
developments, such as the availability of machine readable administrative files with addresses 
and postal codes and the development of in-house software to parse addresses into standard 
components, to assign postal codes and to link postal codes to Census geography. It followed 
as well from the development of a statistical theory for record linkage (Fellegi and Sunter 1969) 
and computer systems based on this theory (Hill and Pring-Mill 1985). 

As a result, a project was initiated in 1986 with the first research (Gamache-O’ Leary et al. 
1987) investigating the use of an Address Register for a mail-out Census rather than the tradi- 
tional drop-off approach. It concluded that the new Census data collection approach would 
be less expensive only if the quality of the Address Register required minimal field updating 
prior to the Census. Two small pilot registers created in early 1987 put Address Register coverage 
at 90-95%, which was unacceptable without field updating (Drew et al. 1987), ruling out the 
use of an Address Register for a mail-out Census. 
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However, the two pilot registers revealed the potential for an Address Register to aid in 
coverage improvement when used in conjunction with the traditional drop-off methodology. 
This fitted well with the emergence of coverage improvement as one of the top priorities for 
the 1991 Census. The results of the Reverse Record Check for the 1986 Census had indicated 
a dramatic rise in the undercoverage rate compared to previous Censuses (from 2.01% in 1981 
to 3.21% in 1986 for the national total population; from 2.08% in 1981 to 3.28% in 1986 for 
the national urban population) (Statistics Canada 1990). It was therefore decided that the 
research project should concentrate on the development of the Address Register to use in 
coverage improvement of the 1991 Census. 

The next section describes the two major tests conducted to develop and refine the procedures 
used to create the Address Register for the 1991 Census. As well, the second section outlines 
the joint agreement with the Province of British Columbia to extend the Address Register. The 
third section presents the administrative and geographic sources used in the production process 
and the structure and content of the Address Register booklets, the end product used by Census 
Representatives in the field. The fourth section describes the methodology used to exploit the 
sources in order to produce the Address Register booklets. In the fifth section, the proposed 
post-censal evaluation is discussed while the last section presents future prospects for the 
Address Register. A separate future report will detail an evaluation of the methodology. 


2. BACKGROUND 


2.1 The November 1987 Test of Coverage Improvement Methods 


A substantial test of the use of the Address Register (AR) as a coverage improvement tool 
was conducted in November 1987 in five large Regional Office cities. It was designed to estimate 
both undercoverage and overcoverage of dwelling units for the traditional Census method of 
listing and for two experimental methods of using an AR for Census coverage improvement: 
Post-list and Pre-list. The Post-list approach had the enumerator compile the dwelling list in 
the usual Census manner (creating a Visitation Record) then reconcile it with a dwelling list 
for the Enumeration Area (EA) derived from the AR. Field follow-ups were done where 
necessary on any address discrepancies between lists. In the Pre-list method, the enumerator 
was given the AR in advance and updated it during a canvass of the EA to create the final 
dwelling list. 

The results (van Baaren 1988) concluded that the Post-list method was the more effective 
in improving coverage. This approach as a simple add-on to the standard Census enumera- 
tion process was fail-safe. If for some reason we failed to produce the AR (either in whole or 
in part) on time for the 1991 Census, the AR reconciliation step could simply be dropped without 
affecting the traditional enumeration process. The test data also provided estimates of the degree 
of coverage improvement and costs (Royce and Drew 1988). It was estimated that 34,000 
occupied dwellings and 68,000 persons would be added by the AR to the medium and large 
urban centres for which it would be constructed (these urban centres representing those areas 
for which an Area Master File exists, 7.e., covering about 65% of the Canadian population). 
This would represent an improvement in coverage of 0.26 percentage points (the national under- 
coverage rate in 1986 being estimated as 3.21 percent). Relative to the two previous attempts 
at AR construction, costs were demonstrated to be low to the Census due to the highly 
automated approach and the proven benefit. As well, the risk was minimized since the tradi- 
tional data collection method would still be used. Based on this cost, benefit and risk assess- 
ment, approval was given for creation of an AR for the 1991 Census. 
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From the November 1987 test, two concerns presented themselves. First, the ordering of 
the addresses in the AR booklets produced for each Enumeration Area (EA) didn’t corres- 
pond to the order in the Visitation Records which made reconciliation a tedious and time- 
consuming task. Second, the overall overcoverage at 17% still seemed too high and more effort 
was required to eliminate erroneously placed or duplicate records. Both these problems were 
addressed by improving the methods for matching the AR to Census geography. Instead of 
linking addresses merely to EAs as had been done for the November test, procedures were devel- 
oped to match the AR to the Area Master File (AMF) (Statistics Canada 1988) blockfaces. An 
algorithm was developed to sort addresses by block and within block in the same order they 
would be encountered by the enumerator in walking around the EA. 


2.2 The September 1989 Test to Refine Procedures 


Another substantial test was conducted in September 1989 involving four cities of various 
sizes: Moncton, Laval, Brampton and Calgary. Each was chosen because of unique difficulties 
that could arise based on the November 1987 test. The results (Dick 1990) showed a signifi- 
cant decrease in coverage from 84% in the 1987 test to 73%, a discouraging outcome. On the 
other hand, this test revealed a considerable reduction in overcoverage down from 17% to 8%. 
Importantly, despite the reduced coverage of the AR, its performance as a coverage improve- 
ment tool for the Census was still viable. On analysis, the new geocoding operation was found 
to be problematic, both in terms of its high costs, since it involved a great deal of clerical 
intervention, and in terms of its quality. The geocoding steps were therefore revamped for 
production, a key aspect of which was the adoption of CANLINK record linkage software 
(Statistics Canada 1989b) to improve quality and reduce costs of the AR/AME linkage. 


2.3 Joint Agreement with the Province of British Columbia 


The Ministry of Finance and Corporate Relations in British Columbia was concerned about 
the high rate of undercoverage in their province in the 1986 Census (4.49% in 1986, up from 
3.16% in 1981, for the provincial total population) (Statistics Canada 1990). Statistics Canada 
entered into a joint agreement with the Planning and Statistics Division (the provincial statistical 
agency) of the Ministry to help reduce undercoverage in British Columbia in the 1991 Census. 
Within this contract, the Address Register was expanded to include smaller urban areas in British 
Columbia, thereby increasing the population covered from 62% to 88%. 


3. SOURCES AND PRODUCT 


Production started in April 1990 and ended with the final Address Register (AR) booklet 
stapled in mid-May 1991, when 22,756 booklets had been compiled containing 6.6 million 
addresses for use in the Census data collection process. 


3.1 Administrative Sources 


In the September 1989 test, it was concluded that wherever possible the following four 
administrative sources ought to be used as sources of addresses to create the AR: telephone 
company billing files, municipal assessment rolls, hydro company billing files and the T1 
Personal Income Tax file. However, the use of all four sources was possible only in Nova Scotia, 
New Brunswick, and eight major urban centres in Ontario (Ottawa, Toronto, Brampton, 
Etobicoke, London, Mississauga, Hamilton and Windsor). Because of the multiplicity of files, 
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the cost of files and refusals, only three sources were used for Newfoundland, Québec, 
Manitoba, Alberta (telephone, hydro and tax files) and for Regina and the rest of Ontario 
(telephone, assessment and tax files). For Saskatoon, only telephone and tax files were available. 
The primary source files used by the British Columbia government were those of telephone 
and hydro, though motor vehicles, cable and Elections files were also used. 


3.2 Geography Sources 
In building the AR, extensive use was made of a Geography Division system and files. 


i. The Area Master File (AMF) (Statistics Canada 1988) is a digitized feature network 
(covering streets, railroads, rivers, etc.) for medium and large urban areas, generally with 
populations of 50,000 or more. Of interest for the AR were the street features which con- 
tained street name and civic number ranges which could be used to locate individual 
addresses onto a blockface, the primary linkage. 


ii. The Computer Assisted Mapping System (CAM) orders blockfaces into blocks and blocks 
into a Census Enumeration Area (EA). CAM was used for the sequencing of addresses 
in the AR booklets. The EA maps produced by CAM were used by the Census Represen- 
tatives for the 1991 Census. For the AR, the maps for all AMF areas were used in the 
second clerical operation. 


iii. The 1990 Postal Code Conversion File (PCCF) (Statistics Canada 1991) is a national file 
of all postal codes, each of which is linked to a 1986 Census EA or a series of 1986 EAs. 
This input was used for secondary linkage of addresses at the EA level. 


iv. The 1986/1991 EA Correspondence File relates the 1986 EA geography to the 1991 
geography. This file was used for the secondary linkage at the EA level and the second 
clerical operation. 


3.3. Address Register Booklets 


The end product consisted of a set of booklets of residential addresses, one for each 
Enumeration Area, covering urban areas of Canada for which an Area Master File existed. 
Figure | contains a fictitious example of a page from an AR booklet (reduced in size). 

Each booklet was divided into two sections: a structured portion and an unstructured 
portion. The structured portion contained all the addresses tied to a blockface with all the 
blockfaces being sequenced into blocks within the EA. The sequencing mirrored that found 
on the map that the Census Representative (CR) used for listing the EA in his/her Visitation 
Record (VR). The unstructured portion contained the addresses that could be tied only to the 
EA rather than a blockface. These were sorted by odd/even civic numbers within street name. 
The volume of addresses split 90%-10% between structured and unstructured. 

Besides the address data, each page in an AR booklet contained a series of columns to be 
used in the reconciliation operation between the AR and VR. In the reconciliation, the Census 
Representative manually compared the Visitation Record with the AR to identify matches and 
non-matches. If the address was only on the VR, it was added to the AR (undercoverage in 
the AR). If the address was only on the AR, field resolution was usually required by the CR, 
with the result that the address was designated either as a new address to be enumerated for 
the Census by the CR (undercoverage in the Census) or as an invalid address classified by type 
of error (overcoverage in the AR). Addresses were denoted as invalid if they were duplicates, 
if they lay outside the EA, or for any other reason. All valid addresses had the Census Household 
Number coded in the booklet by the CR. A telephone number for the address, if available, 
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ADDRESS REGISTER Protected PROVINCE 35 EA 261 _ Page 21 of 22 
FED 038 VN _O 
Address Not Field Invalid 
Block =_———__ Hd Listed Follow. —————————————————_. AR Ref__ Telephone 
No. Civic Apt. No. at up : i No. Number 
‘la Street Nee Drop-off Required Duplicate Sn Other 

ee eee a re ee oye ee re ey eS eel 
1 2 3 4 5 6 7 8 9 10 11 12 

a a tt ae a ee ren i a ed 
4 23 MAIN ST 1044566 5551111 
4 19 MAIN ST 1044564 5561234 
4 15 MAIN ST 1044562 5552321 
4 11 MAIN ST 1044559 
4 ) MAIN ST 1044583 7475739 
4 7 MAIN ST 1044581 5552222 
5 30 CENTRE RD 1019615 5561029 
5 34 CENTRE RD 1019617 
S 34 CENTRE RD BT 1019618 5564261 
5 60 CENTRE RD 1019627 
5 64 CENTRE RD 1019629 7478765 
5 68 CENTRE RD 1019634 5556942 
5 72 CENTRE RD 1019636 
5 76 CENTRE RD 1019640 
5 80 CENTRE RD 1019642 7476789 
5 84 CENTRE RD 1019644 5568765 
5 88 CENTRE RD 1019646 5559999 
5 92 CENTRE RD 1019579 7473456 
5 96 CENTRE RD 1019581 7450987 
S 100 CENTRE RD 1019648 
5 108 CENTRE RD 1019579 5557171 
5 112 CENTRE RD 1019581 5558888 
5 116 CENTRE RD 1019583 7462009 
5 120 CENTRE RD 1019586 7450235 
5 124 CENTRE RD 1019588 5569630 


Figure 1. Example of a Page from an AR Booklet (reduced in size). 


was pre-printed in the last column of the booklet to assist the CR in any required Census 
follow-up operation. 


4. METHODOLOGY 


In this section, the creation of the Address Register (AR) is described. Figure 2 provides 
an overview of the steps involved. 


4.1 Overview of the Methodology 


The free-format addresses contained on the source files were first standardized into ordered 
component parts (steps 1 and 2) in preparation for the use of subsequent software. Then, postal 
codes were confirmed or corrected (step 3) so that those areas or worksites for which the AR 
was to be created could be selected from among all the addresses and locations contained on 
the source files (step 4). Because the same addresses could be contained on more than one file 
or more than once on the same file, unduplication of addresses based on both exact and pro- 
babilistic matching took place (steps 5 and 6). 

Next, automated linkages were made of addresses to the blockface level using the Area 
Master File (step 7) or, where this was not possible, to Enumeration Area (EA) using the Postal 
Code Conversion File (step 8). After loading the addresses into a database management system 
(step 9), manual linkages were made of addresses to blockface (steps 10 and 11) or to EA 
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(step 12). Addresses within each EA were then sequenced by and within blocks (step 13) before 
being printed and collated in booklets by EA (step 14) for use in the Census. 


4.2 Address Standardization (Steps 1, 2 and 3) 


The Postal Address Analysis System (PAAS - step 2 of Figure 2) (Statistics Canada 1989c) 
performed two tasks: it broke up the free-format addresses from the source files into their 
component parts (street name, civic number, street designator, street direction, apartment 
number, municipality, province, postal code) and composed the address search key (ASK). 
ASK is an ordered concatenation of all the components of an address and is used during 
unduplication. 

Although PAAS was an excellent product, analysis from the 1989 prototype had revealed 
certain shortcomings that we felt could be resolved by grooming or filtering the administrative 
file contents prior to using the generalized software. This FILTER step (step 1) concentrated 
on the following tasks: eliminating special characters with which PAAS refused to deal, 
repackaging address components in a manner compatible with PAAS, translating street 
designator short forms to acceptable ones, introducing commas between the street and 
municipality components of the free-format address to improve PAAS’s comprehension, 
eliminating leading zeroes from civic numbers and numeric street names, and adding 
municipality and province names. 

The FILTER and PAAS steps were applied in an iterative fashion. The first step was to 
discover what anomalies needed filtering for each administrative source. If the PAAS error 
rate after filtering was greater than 5%, error records were reviewed to find recurring problems 
that could be successively eliminated by further filtering until an error rate of less than 5% 
was achieved. As any address record that failed address standardization was eliminated from 
further consideration, it was vital to have a PAAS success rate as high as possible. 

The PCVERFY step (step 3) used the Automated Postal Coding System (PCODE) (Statistics 
Canada 1989a) package for confirmation and generation of postal codes. It was not quite as 
effective as the PAAS software at address analysis and could only confirm or add postal codes 
for 84% of the output from PAAS. It confirmed 78% of the postal codes and changed another 
6%. Only .003% of the source administrative records had arrived with no existing postal code. 
It was crucial to have correct postal codes because these would be used for worksite selection 
in the subsequent step. 

Two problems arose in the PCVERFY step during production. If an address was missing 
a municipality/province component, the software continued to attempt to find a postal code 
instead of suspending further processing. As a consequence, enormous amounts of processing 
time could be spent trying to find postal codes. This problem was solved by including in the 
FILTER a step to add municipality and province names. The second problem occurred when 
a street name was numeric, as the processing time per address increased fourfold. This problem 
was not resolved and will necessitate modifications to the PCODE software. 


4.3. Worksite Selection (Step 4) 


This step partitioned the country by postal code into manageable worksites for processing 
with the sizes of worksites being based on the efficiency of CANLINK software for linkage 
of multiple large files. A geographic partitioning into worksites was adopted so they had 
dwelling counts in the 100,000 to 150,000 range based on the 1986 Census. Worksites were 
formed from an individual AMF (for a medium sized city), collections of physically adjacent 
AMFs (for small towns/townships), or parts of an AMF (for a large city). Geography Division’s 
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Postal Code Conversion File (PCCF) which links postal codes and detailed Census geography 
was used to do this partitioning in the SELECT step (step 4). Once partitioning was completed, 
there were 105 distinct worksites and the original 43.4 million addresses had been reduced to 
20.5 million addresses, with the dropped addresses having postal codes outside the AMF areas 
(i.e., smaller cities and rural areas). 


4.4 Unduplication (Steps 5 and 6) 


In order to delete addresses included more than once on the source files, an unduplication 
process was conducted in two stages: an exact match with DEEXACT (step 5) and a probabilistic 
match using CANLINK software (step 6). 

The DEEXACT step utilized the address search key (ASK) produced by the PAAS soft- 
ware and all records with an identical ASK were collapsed into a single record. With 
DEEXACT, the 20.5 million records from the SELECT step were reduced down to 10.1 million 
records. This reduction shows the importance of performing the address standardization. 

Step 6 utilized the CANLINK generalized record linkage software (Statistics Canada 1989b). 
It clusters close records into groups called ‘‘pockets’’ and only records within the same pocket 
are actually matched together. For this application, civic number was used as the pocket. The 
components of the address (street name, municipality name, postal code, efc.) were used for 
matching purposes and weights were assigned for agreement or disagreement of each compo- 
nent. The development of levels of partial agreement for street name, municipality name and 
the last three characters of the postal code allowed for spelling variations and letter transposi- 
tions within the fields. The CANLINK step accounted for a further reduction to 6.7 million 
records. More details on the use of CANLINK in address unduplication are given in Drew et al. 
(1988), where its application in the November 1987 test is described. 


4.5 AR/AMEF Linkage (Step 7) 


The major concern from the 1989 test was the strategy used to link addresses to their respec- 
tive blockface. Because of the 11% drop in coverage from 84% to 73% compared to the 1987 
test, a thorough investigation was needed and possibly a new approach. The other concern 
was that automated matching accounted for only 80% of the records matched while the other 
20% were picked up clerically. This would have represented a daunting manual workload in 
full production. In order to circumvent these two concerns, another CANLINK application 
was developed for the AR/AMF linkage (step 7). 

The original 1989 test files for Brampton still existed, so this became the test site for 
developing this step. The revised approach yielded 10% more matches, which increased the 
coverage back up to 1987 levels. As well, the automated matching was now responsible for 
97% of the matches with 3% being picked up clerically, a significant improvement on the earlier 
80%-20% split. Based on these results, the CANLINK approach was adopted for Census 
production. 

In the construction of the new matching strategy, the first area of study involved a com- 
parison of the contents of fields that would be used for matching purposes. This revealed certain 
anomalies that could be corrected prior to use to improve the number of linkages. The 
processing modifications to existing fields covered the following areas: removal of blanks 
between compound street names; alignment of street directions and civic numbers; conver- 
sion of numeric street names to numbers (on the AMF); removal of special characters in street 
names (on the AMF); correction of spelling variations in municipalities (on the AR); anda 
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recreation of certain PAAS translations for street names (on the AR). Several new fields were 
also generated: NYSIIS (New York State Identification and Intelligence System) and 
SOUNDEX versions of the street name, employing two phonetic encoding packages used to 
eliminate the effects of common spelling errors (Statistics Canada 1989d); a duplicate street 
name flag (on the AMF) to identify situations where a street name was not unique; a unidirec- 
tional street flag (on the AMF) to identify streets that had only a single street direction coded; 
and an official street name flag (on the AR) to indicate that the street name matched an official 
AMF street name. The AMF records contained only street data so we appended the Census 
Subdivision name and a province code and then attempted to assign postal codes to blockface 
civic numbers. When the postal codes differed between the ‘‘from’’ and ‘‘to’’ civic numbers, 
we generated subblockfaces for each unique postal code. 

For this application, three distinct pockets were created for each record, effectively 
triplicating the files. The primary pocket was the most stringent in nature and was designed 
to find all the good match possibilities quickly in the first pass of the files. It was composed 
of street name/Forward Sortation Area (FSA)/odd or even civic number flag. The second 
pocket was postal code/odd or even civic number flag which allowed for poorly parsed addresses 
to be matched on postal code. The third was the NYSIIS version of the street name/odd or 
even civic number flag which allowed records with spelling variations in street name and missing 
postal codes to be considered as potential matches. 

The function rules established for partial matches for street name, municipality name and 
the last three characters of the postal code were taken directly from our existing CANLINK 
application used for internal unduplication where they had already demonstrated their 
effectiveness. 

However, there were three AMFs to which we had difficulty matching in the course of 
production: Red Deer, St. Thomas and Charny. The problem with all three was missing civic 
number data on the AMF. Knowing that these would require heavy clerical intervention, a field 
operation was mounted in December 1990 to update the maps from the Computer Assisted 
Mapping System (CAM). CAM maps from Geography Division were sent to Regional Office 
staff who added the missing civic number ranges. These updated maps were subsequently 
forwarded to Geography Division for inclusion in the next round of updates to the AMF. For 
the creation of the AR, the civic number ranges for the three AMFs were used manually in 
the clerical operation. 

Success in matching was quite similar across all provinces except for Québec. In Québec, 
the automatic matching to the blockface dipped by about 10-12% to 73% as it was not as effec- 
tive at dealing with French addressing as it was with English addressing. Three situations were 
identified as causes for the drop in the automatic match rate: the use/non-use of articles within 
the street name (e.g., Savane, de la Savane, la Savane), the use of complete personal names 
as street names with a high degree of spelling variability (e.g., Jean-Francois Belanger, 
J.F. Belanger and Jean F. Belanger) and the lack of street designators. As a result, the clerical 
operations described below, especially the first one, were of increased importance for matching 
in Québec relative to the other provinces. 

During the AR/AMF processing with the CANLINK software, the only problem that arose 
was in exceeding an internal pocket maximum on the number of records allowed. The solution 
was to identify the streets causing the problem from the pocket report (they were always major 
thoroughfares) and set up special pre-processing programs that would add the fifth digit of 
the postal code in calculating the pocket value for those streets to make it more discriminating. 
This had the effect of reducing the number of records within the pocket. 
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4.6 AR/PCCE Linkage (Step 8) 


This step (step 8) attempted to obtain an automated link to the proper Enumeration 
Area (EA) for those addresses which could not be matched to the blockface using the AMF 
in step 7. 

The principal inputs were the Postal Code Conversion File (PCCF), which gave the 
correspondence between postal codes and 1986 EAs, and the 1986 to 1991 EA Correspondence 
File. By matching the two together we could identify postal codes that were uniquely matched 
to a single 1991 EA, as well as postal codes matched to two or more possible 1991 EAs, requiring 
manual work to resolve later in step 12. 

Again, Brampton became the test vehicle. The analysis of the postal code/EA matching 
revealed that 38% of the postal codes could be uniquely assigned to a 1991 EA. The linkage 
to these postal codes of the AR records unmatched to a blockface yielded a further 5% increase 
in total matches. Overall, the automated match rate increased to 89% (84% to the blockface 
and 5% to the EA), up from 64% in the September 1989 test, almost cutting in half the amount 
of manual intervention. 


4.7 Loading the Base (Step 9) 


To facilitate queries and in anticipation of future usage, ORACLE had been used in the 
1989 test as the database management system and was used again for the 1991 production. The 
ORACLE load step (step 9) involved the transformation of the up-to-now sequential file into 
four separate component files, one for each of municipality, blockface, street and address. 


4.8 Clerical Procedures (Steps 10, 11 and 12) 


The clerical procedure for the 1989 test was a review of all unique combinations of street 
name/street designator/street direction from both AMF and AR records along with an AR 
record count for each street combination. The objective was to replace an unmatched AR street 
combination with the legimate AMF combination. By comparing similar street combinations 
and determining which ones should in fact have been identical, hitherto uncoded AR records 
could be matched manually to a particular blockface. This procedure had worked well in 1989 
and had proved useful in two problem situations: those where there were large discrepancies 
in street name spelling and those where the AR street name field contained both the street name 
and a street designator short form that the PAAS software had not understood in parsing the 
address. 

We expanded the capability of this clerical procedure (step 10) to compare AR street com- 
binations with other similar AR street combinations to handle instances where a particular street 
might have a number of AR spelling variations with no AMF equivalent. This expansion 
permitted some additional manual coding of addresses to blockface. 

To summarize, in this first clerical procedure (Clerical-1), all addresses not coded 
automatically to blockface in step 7 (that is, those coded automatically to EA in step 8 and 
those not yet coded) were examined for possible manual coding to blockface. 

Following the Clerical-1 procedure, we added a Compress step (step 11), which was applied 
to all records coded to the blockface. For each unique value of street name/street 
designator/street direction within a worksite, all the corresponding address records were 
checked for uniqueness using the civic number/apartment number as the key. Where multiple 
records occurred, they were collapsed with all pertinent data blended into one single record, 
a further step of unduplication. 
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As a result, at the end of step 10, the database contained addresses coded automatically or 
manually to blockface, automatically to EA or uncoded as yet. 

Step 12 now dealt with those residual addresses that could not be linked to a unique EA 
but could be matched to two or more possible EAs via step 8. Acomplete set of CAM-generated 
maps was produced for the AR project. The Clerical-2 step consisted of examining these maps 
for the candidate EAs to assign these residual addresses to the proper EA wherever possible. 

Overall, the ratio of automated to manual matching was 91%-9%. The automated portion 
comprised 87% from the AR/AMF linkage to blockface, and 4% from the AR/PCCF linkage 
to EA. The manual portion was split 3% matched to the blockface from the Clerical-1 operation 
and 6% to the EA in Clerical-2. 

Although ORACLE was an appropriate vehicle for the 1989 prototype, it proved to be costly 
and eventually a bottleneck once in full production with the AR as just one user on a Bureau- 
wide database. It allowed for only 8-10% of the worksites on-line at any one time, and had 
to export and import sites continuously to free up space and reload to carry on processing. 
A second ORACLE database was therefore set up for exclusive use of the AR team. In fairness 
to ORACLE, not all the processing being done was conducive to any database management 
system. The product was being built and as a consequence large portions of the tables were 
being examined to make sweeping field changes, to eliminate duplication and to select records 
for printing. ORACLE did offer tremendous flexibility to change software procedures quickly 
and generate new ones as production unfolded. 


4.9 Use of the Computer Assisted Mapping System (Step 13) 


The Computer Assisted Mapping System (CAM) was a new research initiative for the 1991 
Census whose development ran concurrently with AR development. The system generated all 
the Enumeration Area maps within AMF coverage areas. This was a major departure from 
the manual map generation process of the past. CAM also provided a structure to EAs that 
located blockfaces within blocks and sequenced the blocks within the EA (step 13). An off- 
shoot to CAM for AR purposes was set up to sequence the dwellings on the blockface. This 
was necessary to organize the address lists in a manner corresponding more closely to the way 
the Census Representatives do their listing. 

CAM was fully implemented by the time of AR production. In order to remain compatible 
with it, the same vintage of the AMF that CAM employed was used. However, a small portion 
of blockfaces had no structure data assigned to them. For any EA where this percentage was 
greater than 5%, either CAM was re-executed for that worksite if time permitted or an alternate 
system, Point-in-Polygon Assignments (PIPA), that locates blockfaces within their EA was 
executed. Although PIPA shifted addresses from the structured portion of the AR booklet 
(based on blockface coding) to the unstructured portion (EA coding), at least the affected 
addresses were not dropped during the print selection process, which was the case when 
sequencing data were missing. 


4.10 Printing and Booklet Production (Step 14) 


The last production step was the printing and gathering of booklets (step 14) for the almost 
23,000 Enumeration Areas containing at this point 6.6 million addresses. Major concerns which 
were addressed included print speed and quality (a continuous-page printer was used), durability 
of booklets (the booklets had front and back covers and were stapled) and compilation costs 
(the booklets were gathered and attached in-house). 
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5. POST-CENSAL EVALUATION 


The post-censal evaluation can be broadly categorized into four study areas: field opera- 
tions, data capture of AR booklets, update of the AR and determination of the AR contribu- 
tion to coverage improvements. 

Evaluation of field operations will focus on the effectiveness of training, how complete the 
reconciliation work was, and causes of errors, with a view to improving the methodology for 
future Censuses. 

The data capture operation will yield two separate outputs. First, addresses printed in the 
booklets will be deleted if invalid, and if valid their Census Household Number will be captured. 
Second, the new addresses added by the Census Representatives will be captured. It will then 
be possible to calculate the AR overcoverage and undercoverage rates and the AR contribution 
to Census coverage. Addresses placed in the wrong EA can be investigated and traced back 
to the source of error. Through the Census Household Number, the number of persons added 
and characteristics of dwellings and persons can be studied. 

From a cost perspective, the unit cost per dwelling added by the AR will be calculated, in 
view of the cost of creating the AR and using it in the Census. 


6. FUTURE DIRECTIONS 


The Address Register (AR), although initially set up as one of the procedures for reducing 
Census undercoverage, is a developmental project with potential impact on other programs 
within Statistics Canada as well as other government agencies. 

The more immediate objectives for the future development of the AR are as follows: to incor- 
porate the addresses identified during Census enumeration; to evaluate the effectiveness of 
the AR in improving coverage of the 1991 Census; to document and evaluate the production 
activities; and to develop a longer-term plan for the AR addressing its cost-effectiveness as 
a household frame, the optimal updating strategy and its potential for use by external agencies. 

Within these guidelines, a project plan was prepared and is presented below under six main 
topic areas. 


6.1 Relationships between the Census and the Address Register 


Besides the potential for coverage improvement, other ways in which the AR could contribute 
to the Census will be explored. Some preliminary thoughts in this regard include possibilities 
for the AR to be used as a processing control file, for telephone numbers to be used for follow-up 
purposes, for creation of control numbers of dwellings in an Enumeration Area, for certifica- 
tion of dwelling counts for processing, or for migration analysis. Consideration will be given 
to whether the AR should be used before or after Census Day, and to how the AR might be 
used for those addresses where only a higher level of geography than the EA can be ascertained. 


6.2 Relationships between Geography and the Address Register 


As is evident in the description of the methodology, the creation of the AR relied heavily 
on many of the products from Geography Division (e.g., the Area Master File, the Postal Code 
Conversion File). Their contributions and limitations in building the AR will be reviewed. For 
any new products developed by Geography Division, their possible use in the AR will be 
investigated with a view to incorporating the AR needs directly into the new product. As well, 
the AR will be integrated into the Geography Division’s Geographical Information System 
(GIS). 
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The AR may be able to provide update indicators to the Area Master File (AMF) or for 
the delineation of Enumeration Areas. The AR could be used to establish priorities especially 
in high-growth areas or in areas where there are poor civic number ranges in the AMF. The 
updating of the Postal Code Conversion File might be served by postal code/Enumeration Area 
or postal code/blockface combinations from the AR. After each Census, all Census households 
are encoded with blockface centroids. Since the bulk of AR records have already been geocoded 
prior to the Census, a link of the AR with the Census Household Number will reduce the amount 
of manual geocoding work after the Census. This last project is already in progress. 


6.3. Documentation, Evaluation and Improvement of Procedures 


A user guide documenting procedures and a technical guide to document programs, sample 
problems and solutions and quality assurance are being prepared for the work done to date. 

As with any new project, much is learned during the creative process and procedures are 
developed as required and as time and budget permit. After the fact, there are usually effi- 
ciencies to be gained by reviewing these procedures. 

For the automated procedures, projects already underway include a more efficient use of 
ORACLE or choice of another system, the use of desk-top computers rather than the Statistics 
Canada mainframe computer, standardization of the filter, enhancements to PAAS, 
amalgamation of sites into provincial databases, the dropping of some fields earlier in the 
process, consideration of other postal coding software, improvement of address place name 
matching and an improvement of the Area Master File linkage with French addresses. 

For the manual procedures, improved handling of adjacent Enumeration Areas across 
boundaries of Federal Electoral Districts and of the lack of civic numbers on CAM maps are 
to be pursued. The editing system to correct addresses will be reviewed for possible improve- 
ment as well. 

Telephone numbers were added at a later stage within the AR production. A thorough 
evaluation of their coverage and accuracy will be undertaken especially in view of the poten- 
tial uses of telephone numbers in the Census and other Statistics Canada surveys. For the latter, 
initial emphasis will be placed on testing within the context of the upcoming redesign of the 
Labour Force Survey. 

Computer systems developed for the initial production have already been cleaned up to a 
large extent for better efficiency of mainframe expenditures, for programs and disk and tape 
storage, for file manipulation, for output, libraries and file access. Better system controls will 
be prepared. 

This AR was produced only for urban areas. Future methodological development will 
examine the potential for extension to rural areas. 


6.4 Updating Methodology 


The AR was created from among four sets of administrative files: telephone files, municipal 
assessment files, hydro files and the T1 tax file from Revenue Canada. As well, the AR is 
currently being updated to be consistent with the 1991 Census so that the Census is also a source. 
The relative contributions of these source files, both in volume and quality, will be investigated 
so that a decision on acquisition of files for updating can be made. 

An integral part of the updating strategy is the development of a methodology for updating. 
The definition of an update will be needed along with an update system. The cost effectiveness 
of ongoing updating, dependent on the various needs which result from projects identified 
throughout these future directions, will be considered as well. Is ongoing updating cost effective 


140 Swain et a/.: An Address Register for Coverage Improvement 


when compared to updating only in time for the Census? What requirements will there be from 
other possible uses? Answers to these questions will lead to an updating strategy. 


6.5 Other Uses of the Address Register in Statistics Canada 


Besides the Census and geographical relationships presented earlier, a number of other uses 
are suggested within Statistics Canada. The potential use of the AR in the Labour Force Survey 
(LFS) will be investigated as part of the LFS Redesign Project. The possibility of using the 
AR in urban areas either to improve sampling under the existing area frame or as a list frame 
to reduce the number of stages in the sample design are two major areas highlighted for research. 
With telephone numbers on the AR, more telephone interviewing would be possible. 

The use of the AR as a survey frame for other Statistics Canada surveys will be examined. 
In addition, since the AR currently uses telephone files as a primary source of information, 
it has these files on hand for further exploitation. The Special Surveys Program, the General 
Social Survey and the existing Labour Force Survey are areas which use or require telephone 
files. 

Another potential application within Statistics Canada is as a housing database if the AR 
were enriched with housing data from the 1991 Census and data obtained from municipal 
assessment files, for example. The existence of such a database might reduce the amount of 
information on housing that would have to be collected in future Censuses. Data needs and 
availability have to be explored. 


6.6 Uses of the Address Register External to Statistics Canada 


If the AR is to be used outside Statistics Canada, issues of confidentiality of the source 
files and releasability of the AR must be addressed and meet the requirements of the Statistics 
Act. Some source files were provided to Statistics Canada in confidence, either contractually 
(e.g., some files from Alberta) or legally (the T1 file from Revenue Canada). 


6.7 Conclusion 


The breadth and diversity of the ideas contained above in future directions demonstrate the 
potential of the Address Register as a geographical product with applications in many areas 
of Statistics Canada and elsewhere. 
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Bibliography on Capture-Recapture Modelling With 
Application to Census Undercount Adjustment 


STEPHEN E. FIENBERG! 


ABSTRACT 


This article presents a selected annotated bibliography of the literature on capture-recapture (dual system) 
estimation of population size, on extensions to the basic methodology, and the application of these 
techniques in the context of census undercount estimation. 


KEY WORDS: Capture-recapture; Census undercount; Dual system estimation; Loglinear models. 


1. INTRODUCTION 


The method of capture-recapture for estimating the size of a closed population has been 
in use since at least the nineteenth century, when Peterson (1896) developed the standard 
estimator that bears his name for the use with fish populations. Subsequent application to other 
types of populations include Geiger and Werner (1924) - physics; Lincoln (1930) - wildlife; 
Chandrasekar and Deming (1948) - vital statistics for human populations; Wittes and Sidel 
(1968), Wittes, Colton and Sidel (1974) - epidemiology; Sanathanan (1972b) - particle scan- 
ning in physics; Blumenthal and Marcus (1975) - life testing; Green and Stollmack 
(1981),Rossmo and Routledge (1990) - crimes and criminals. In the context of the study of 
human populations and demography the method is often referred to as dual system estimation. 
We have included virtually no references to the related problem of counting the number of 
species, which goes back to the work of R.A. Fisher in the 1940s and had an elegant formulation 
in Efron and Thisted’s (1976) Biometrika paper on ‘‘How many words did Shakespeare 
know?’’. 

The basic capture-recapture approach rests on a number of assumptions, e.g.: (1) the popula- 
tion under study is closed; (2) individuals (units) can be perfectly matched from capture to 
recapture; (3) capture probabilities are constant across the individuals (units) in the population; 
(4) the probability of inclusion of an individual (unit) in recapture sample is independent of 
inclusion in original census or sample. Beginning in the late 1930s various investigators began 
to explore extensions that allowed for departures from the assumptions. These methods typically 
require additional data such as a second recapture (or even a third) and the full capture-recapture 
history of each individual. 

For human populations and the study of vital statistics the methodology has long been linked 
to census data, e.g., see Tracy (1941) and Shapiro (1949, 1954). In connection with the 1950 
decennial census of population, the U.S. Bureau of the Census introduced the use of a sample 
matched to the census records for coverage evaluation. This approach has evolved into what 
is currently known as the Post Enumeration Survey approach to undercount and overcount 
estimation, and it has been the focal point of the recent and ongoing controversy of the possible 
adjustment of the 1980 and 1990 censuses, e.g., see Eriksen and Kadane (1985); Freedman and 
Navidi (1986, 1992); Freedman (1991); Wolter (1991). 
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This selected annotated bibliography presents an overview of published literature on capture- 
recapture estimation of population totals. It includes historical references, articles that explore 
departures from assumptions and extensions of the basic methodology, and is most complete 
in connection with papers that describe the dual and multiple system approaches in the context 
of census undercount estimation. In this regard, however, we have not included references to 
any of the unpublished memoranda and papers from the U.S. Bureau of the Census (primarily 
because most of these have been replicated in some form in the published literature). We have 
tended to exclude articles published in unrefereed proceedings for related reasons. Because the 
literature on specialized applications of capture-recapture techniques to wildlife populations 
is so extensive, and only some of it is of relevance for human populations, we have provided 
primarily references to reviews of this literature, e.g., see Brownie ef al. (1977); Otis et al. (1978); 
Seber (1973, 1982). Similarly we have included only a small number of references to the more 
specialized methods in use for life testing, e.g., see Dahiya and Blumenthal (1986), as well as 
those in use for software reliability applications, e.g. Jelinski and Moranda (1972), and Duran 
and Wiorkowski (1981). The methods in this latter literature diverge in significant ways from 
those used in the basic capture-recapture and dual system approaches. 


2. SELECTED BIBLIOGRAPHY 


ALHO, J.M. (1990). Logistic regression in capture-recapture models. Biometrics, 46, 623-635. 
e Extends usual dual systems approach to allow for multiplicative stratification effects. 


BAKER, S. G. (1990). A simple EM algorithm for capture-recapture data with categorical covariates 
(with discussion). Biometrics, 46, 1193-1200. 


e Links cross-classification of covariates to the capture and recapture via loglinear models and then uses 
EM algorithm to estimate population size. 


BIEMER, P.P. (1988). Modelling matching error and its effect on estimates of census coverage error. 
Survey Methodology, 14, 117-134. 


¢ Develops models for evaluating impact of matching error on census coverage. 


BISHOP, Y.M.M., FIENBERG, S.E., and HOLLAND, P.H. (1975). Discrete Multivariate Analysis: 
Theory and Practice, Chapter 6. Cambridge, MA: MIT Press. 


e Monograph on loglinear models which includes a chapter on the relationship to capture-recapture 
models. 


BLUMENTHAL, S., and MARCUS, R. (1975). Estimating population size with exponential failure. 
Journal of the American Statistical Association, 70, 913-922. 


e Uses exponential distribution to estimate population size based on a subset of observations obtained 
by truncated sampling. 


BOSWELL, M.T., BURNHAM, K.P., and PATIL, G. P. (1988). Role and use of composite sampling 
and capture-recapture sampling in ecological studies. In Handbook of Statistics 6: Sampling, 
(Eds. P.R. Krishnaiah and C.R. Rao). Amsterdam: North Holland, 469-488. 


e Gives succinct summary of several basic variants on capture-recapture models and their estimation. 


BROWNIE, C., ANDERSON, D.R., BURNHAM, K.P., and ROBSON, D.S. (1977). Statistical inference 
from band recovery data: a handbook. U.S. Fisheries and Wildlife Service Resource Publication No. 131. 


¢ Describes a comprehensive range of capture-recapture models and appropriate goodness-of-fit tests, 
with emphasis on banding experiments. 
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BURGESS, R.D. (1988). Evaluation of reverse record check estimates of undercoverage in the Canadian 
Census of Population. Survey Methodology, 14, 137-156. 


¢ Describes the survey-based accounting approach of the reverse record check for undercount estimation. 
Does not deal with issue of exclusion of individuals from census and other lists. 


BURNHAM, K. P., ANDERSON, D.R., WHITE, G.C., BROWNIE, C., and POLLOCK,K.H. (1987). 
Design and Analysis Methods for Fish Survival Experiments Based on Release-Recapture. Bethesda, 
MD: American Fisheries Society. 


¢ Combines methodology of Brownie et al. for band recovery with survival estimation under J olly-Seber 
mark-recapture models. 


BURNHAM, K.P., and OVERTON, W.S. (1978). Estimation of the size of a closed population when 
the capture probabilities vary among animals. Biometrika, 65, 625-633. Correction (1981) 68, 345. 


e Develops a capture-recapture model with heterogeneity for animals but constant probabilities of capture 
across samples. Model induces dependencies amongst captures. 


CASTELDINE, B.J. (1981). A Bayesian analysis of multiple-recapture sampling for a closed population. 
Biometrika, 67, 197-210. 


¢ Develops a Bayesian approach using beta priors for traditional independence-based Schnabel census 
model for multiple recapture data. 


CHAKRABORTY, P.N. (1963). On a method of estimating birth and death rates from several agencies. 
Calcutta Statistical Association, Bulletin, 12, 106-112. 


¢ Extends Chandrasekar-Deming approach to three or more sources. 


CHANDRASEKAR, C., and DEMING, W.E. (1949). Ona method of estimating birth and death rates 
and the extent of registration. Journal of the American Statistical Association, 44, 101-115. 


¢ Develops dual-system technique and suggests the use of stratification for eliminating heterogeneity. 
Applies approach to estimation of number of births and deaths in several Indian villages. 


CHAO, A. (1987). Estimating the population size for capture-recapture data with unequal catch ability. 
Biometrics, 43, 783-791. 


e Explores heterogeneous catchability model of Burnham and Overton using a moment inequality to 
get a lower bound on population size. 


CHAO, A. (1989). Estimating population size for sparse data in capture-recapture experiments. 
Biometrics, 45, 427-438. 
e Explores adequacy of estimator resulting from moment inequality for heterogeneous catchability model 
in settings involving sparse data. 


CHAPMAN, D.G. (1951). Some properties of the hypergeometric distribution with applications to 
zoological sample censuses. University of California Publications in Statistics, 1, 131-160. 


¢ Develops the hypergeometric sampling model for estimating the population size in capture-recapture 
studies. 


CHOI, C.Y., STEEL, D.G., and SKINNER, T.J. (1988). Adjusting the 1986 Australian census count 
for under enumeration. Survey Methodology, 14, 173-189. 
¢ Describes the use of dual system estimation and a post enumeration survey to adjust the results of 
the Australian census. Also applies Wolter sex-ratio technique to check on sensitivity of dual system 
estimator. 


CHRISTENSEN, H.T. (1958). The method of record linkage applied to family data. Marriage and Family 
Living, 20, 38-43. 
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CITRO, C. F., and COHEN, M. L., (Eds.) (1985). The Bicentennial Census. New Directions for 
Methodology in 1990. Washington, DC: National Academy Press. 


e Report of a panel of the Committee on National Statistics on census methodology including an 
examination of the dual systems approach to undercount correction. 


COALE, A.J. (1961). The design of an experimental procedure for obtaining accurate vital statistics. 
International Population Conference, New York, 372-375. 


e¢ Proposes the use of two lists covering the same sample from a population. 


COHEN, M.L. (1990). Adjustment and reapportionment - analyzing the 1980 decision. Journal of Official 
Statistics, 6, 241-250. 


e Examines effect of bias and variability on accuracy of adjusted and unadjusted census counts and the 
impact on the reapportionment of the U.S. House of Representatives. 


CORMACK, R. M. (1981). Loglinear models for capture-recapture experiments on open populations. 
In The Mathematical Theory of the Dynamics of Biological Populations, I] (Eds. R.W. Hiorns and 
D. Cooke). London: Academic Press, 217-235. 


e Introduces Poisson model for capture-recapture and uses it with loglinear models to extend standard 
approach to allow for birth, death, and trap dependency. 


CORMACK, R. M. (1989). Log-linear models for capture-recapture. Biometrics, 45, 395-413. 


e Uses Poisson model and loglinear representation for inclusion of birth, death, and trap dependency 
into standard capture-recapture approach. 


CORMACK, R. M., and JUPP, P.E. (1991). Inference for Poisson and multinomial models for capture 
recapture experiments. Biometrika, 78, 911-916. 


¢ Compares MLEs of parameters under the two models and presents relationship between the corresponding 
asymptotic variances and covariances. 


COWAN, C.D., and MALEC, D.J. (1986). Capture-recapture models when both sources have clustered 
observations. Journal of the American Statistical Association, 81, 347-353. 


e¢ Extends dual systems approach to situation involving clustered observations as in the U.S. census 
coverage improvement program. 


CRESSIE, N. (1988). When are census counts improved by adjustment? Survey Methodology, 14, 191-208. 


¢ Proposes a PES-based model for undercount adjustment utilizing an empirical Bayes estimation scheme 
and a family of loss functions. 


CRESSIE, N. (1989). Empirical Bayes estimation of undercount in the decennial census. Journal of the 
American Statistical Association, 84, 1033-1044. 


¢ Develops and applies empirical Bayes smoothing methods for census adjustment factors produced from 
dual systems approach for geographic by demographic stratification. Applies approach to state data 
from 1980 U.S. census. 


CRESSIE, N., and DAJANI, A. (1991). Empirical Bayes estimation of U.S.undercount based on artificial 
populations. Journal of Official Statistics, 7, 57-67. 


¢ Shows that synthetic estimation approach used by Isaki et a/. is special case of empirical Bayes. 


CROXFORD, A.A. (1968). Record linkage in education. In Record Linkage in Medicine (Ed. 
E.D. Acheson). London: E. and S. Livingstone, 351-356. 
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DAHIYA, R.C., and BLUMENTHAL, S. (1986). Population or sample size estimation. In Encyclopedia 
of Statistical Sciences, (Volume 7), (Eds. S. Kotz and N.L. Johnson). New York: Wiley, 100-110. 


¢ Reviews theory underlying population size estimation from truncated sampling for discrete distributions 
and provides references to domains of application. 


DARROCH, J.N. (1958). The multiple-recapture census I: Estimation of a closed population. Biometrika, 
45, 343-359. 


e Describes the maximum likelihood approach to the multiple recapture problem under complete 
independence. 


DARROCH, J.N. (1959). The multiple-recapture census II: Estimation when there is immigration or 
death. Biometrika, 46, 336-351. 


e Extends the maximum likelihood approach under independence to open populations with either 
immigration or death. 


DARROCH, J.N. (1961). The two sample capture-recapture census when tagging and sampling are 
stratified. Biometrika, 45, 343-359. 
e Extends the maximum likelihood approach with independence to the situation where the original 
captured individuals are stratified into s groups and the individuals in the recapture sample are stratified, 
but according to ¢ (possibly different) strata. 


DARROCH, J.N., FIENBERG, S.E., GLONEK, G.F.V., and JUNKER, B.W. (1992). A three-sample 
multiple-recapture approach to census population estimation with heterogeneous catchability. Submitted 
for publication. 


e Extends triple system estimation to allow for individual heterogeneity and selected forms of dependence. 
Applies estimators to triple system data from census dress rehearsal in St. Louis. 


DARROCH, J.N., and RATCLIFF, D. (1980). A note on capture-recapture estimation. Biometrics, 36, 
149-153. 


¢ Presents an alternative estimator for capture-recapture problems with interesting asymptotic properties. 


DASGUPTA, P. (1964). On the estimation of the total number of events and of the probabilities of 
detecting an event from information supplied by several agencies. Calcutta Statistical Association, 
Bulletin, 13, 89-100. 


e¢ Extends Chandrasekar-Deming approach to three or more sources. 


DAVIDSON, L. (1962). Retrieval of misspelled names in an airline passenger record system. Communications 
of the Association of Computer Machinery, 5, 169-171. 


DEMING, W.E., and KEYFITZ, N. (1967). Theory of surveys to estimate total populations. In 
Proceedings of the World Population Conference, Belgrade, 1965 (Vol. 3). New York: United Nations, 
141-144. 


e Extends of Chandrasekar-Deming approach to three sources. 


DIFFENDAL, G. (1988). The 1986 test of adjustment related operations in Central Los Angeles County. 
Survey Methodology, 14, 71-86. 


¢ Describes implementation of post-enumeration survey approach to dual system estimation in a test 
census. 


DING, Y. (1990). Capture-Recapture Census with Uncertain Matching. Ph.D. dissertation, Department 
of Statistics, Carnegie Mellon University. 
¢ Develops a probabilistic matching model for use with dual and multiple system estimation, and considers 


a Bayesian approach for estimating the population size. Illustrates techniques using data from test 
census results from Los Angeles. 
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DING, Y., and FIENBERG, S.E. (1992). Estimating population and census undercount in the presence 
of matching error. Submitted for publication. 


e Develops a probabilistic matching model for use with dual system estimation and illustrates its 
application to data from test census results from Los Angeles. 


DURAN, J.W., and WIORKOWSKI, J.J. (1981). Capture-recapture sampling for estimating software 
error content. JEEE Systems Engineering, 7, 147-148. 


EFRON, B., and THISTED, R.A. (1976).Estimating the number of unseen species: How many words 
did Shakespeare know? Biometrika, 63, 435-467. 


e Adapts a parametric model due to Fisher and a nonparametric model for the classical species problem 
using empirical Bayes methods. Applies approach to the vocabulary of Shakespeare. 


EL-KHORAZATY, M.N., IMREY, P.B., KOCH, G.G., and WELLS, H.B. (1977). Estimating the total 
number of events with data from multiple record systems: a review of methodological strategies. 
International Statistical Review, 45, 129-157. 


e Review of literature and methods for dual- and multiple systems estimation. Includes sections comparing 
use of techniques and departures from assumptions in wildlife and human populations. 


ERICKSEN, E.P., and KADANE, J.B. (1985). Estimating the population in a census year: 1980 and 
beyond (with discussion). Journal of the American Statistical Association, 80, 98-131. 


e Applies dual system approach to 1980 census data, including the regression-based smoothing of 
undercount estimates and the estimation of adjusted odds ratios using demographic estimates. 


ERICKSEN, E.P., KADANE, J.B., and TUKEY, J.W. (1989). Adjusting the 1980 census of population 
and housing. Journal of the American Statistical Association, 84,927-944. 


e Presents revisions and extensions to the Ericksen and Kadane methodology and a critique of Freedman 
and Navidi. 


FAY, R. E., PASSEL, J. S., ROBINSON, J. G., and COWAN, C. D. (1988). The Coverage of Population 
in the 1980 Census. Bureau of the Census. Washington, DC: U. S. Department of Commerce. 


¢ Official Bureau of the Census report on attempts to measure undercount in the 1980 U.S. decennial 
census. 


FEIN, D.J., and WEST, K.K. (1988). The sources of census undercount: Findings from the 1986 Los 
Angeles Test Census. Survey Methodology, 14, 223-240. 


e Attempts to test hypotheses regarding the causes of census undercount for a hard-to-enumerate Hispanic 
urban population. 


FIENBERG, S.E. (1972). The multiple-recapture census for closed populations and the 2 incomplete 
contingency table. Biometrika, 59, 591-603. 


e Introduces a method for estimating dependencies among multiple lists using loglinear models and 
develops a general approach for estimation using results on oe incomplete contingency tables and 
conditional estimation. 


FIENBERG, S.E. (1989). Undercount in the U.S. decennial census. In Encyclopedia of Statistical Sciences, 
(Supplemental Volume), (Eds. S. Kotz and N.L. Johnson). New York: Wiley, 181-185. 


e Presents historical background on the differential undercount of the U.S. population and brief 
descriptions of demographic analysis and the dual system estimation approaches. 


FREEDMAN, D. A. (1991). Policy forum: Adjusting the 1990 census. Science, 252, 1233-1236. 


¢ Critique of dual systems approach to adjustment of the 1990 census. 
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FREEDMAN, D.A., and NAVIDI, W.C. (1986). Regression models and adjusting the 1980 census 
(with discussion). Statistical Science, 1, 3-39. 


¢ Critique of Ericksen and Kadane dual systems methodology as applied to 1980 census data. 


FREEDMAN, D.A., and NAVIDI, W.C. (1992). Should we have adjusted the census of 1980? (with 
discussion). Survey Methodology, this issue. 


¢ Continues critique of the use of dual system estimation and synthetic adjustment as applied to 1980 
census. 


GARTHWAITE, P.H., and BUCKLAND, S.T. (1990). Analysis of multiple-recapture census by 
computing conditional probabilities. Biometrics, 46, 231-238. 


e Uses a recursive relationship to generate point and interval estimate for multiple-recapture census under 
independence. 


GEIGER, H., and WERNER, A. (1924). Die Zahl der ion radium ausgesandsen a-Teilchen. Zeitschrift 
fiir Physik, 21, 187-203. 


e Applies a capture-recapture method to radium ion particle detection estimation. 


GOLDBERG, J.D., and WITTES, J.T. (1978). The estimation of false negatives in medical screening. 
Biometrics, 34, 77-86. 


e Applies capture-recapture models to problems in medical screening. 


GOUDIE, I. B. J. (1990). A likelihood-based stopping rule for recapture debugging software reliability. 
Biometrika, 77, 203-206. 


GREEN, M.A., and STOLLMACK, S. (1981). Estimating the number of criminals. In Models in 
Quantitative Criminology, (Ed. J.A. Fox). New York: Academic Press, 1-24. 


GREENFIELD, C.C. (1975). On the estimation of a missing cell in a2 x 2 contingency table. Journal 
of the Royal Statistical Society, Series A, 138, 51-61. 
e Introduces a non-zero value for the response correlation, by taking the mid-point of the range of 
permissible correlation values, and consequently derives a value for missing cell. Applies approach 
to census data from Malawi. 


GREENFIELD, C.C. (1976). A revised procedure for dual record systems in estimating vital events. 
Journal of the Royal Statistical Society, Series A, 139, 389-401 


e Applies bounds on correlation in a 2 x 2 table to dual system estimation in the presence of event 
correlation induced by heterogeneity. 


GREENFIELD, C.C., and TAM, S.M. (1976). A siniple approximation for the upper limit to the value 
of a missing cell in a2 x 2 contingency table. Journal of the Royal Statistical Society, Series A, 139, 
96-103. 


e Uses approximation for upper bound for response correlation to derive an upper bound for missing cell. 


HOGAN, H., and WOLTER, K.M. (1988). Measuring accuracy in a post-enumeration survey. Survey 
Methodology, 14, 99-116. 


e Reports on Los Angeles Test of Adjustment Related Operations and estimates of sources of bias in 
post enumeration survey and census-based dual systems estimates. 


HOLST, L. (1973). Some limit theorems with applications in sampling theory. Annals of Statistics, 1, 
644-658. 
e Applies results on successive sampling to derive asymptotic distribution of usual Peterson estimator 
when there are heterogeneous capture probabilities or the effects of matching. 
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HOOK, E., and REGAL R. (1982). Validity of Bernoulli census, log-linear, and truncated binomial models 
for correcting underestimates in prevalence studies. American Journal of Epidemiology, 116, 168-176. 


e Applies different loglinear related methods used to study the number of infants born with Downs 
syndrome. 


HUGGINS, R.M. (1989). On the statistical analysis of capture experiments. Biometrika, 76, 133-140. 
e Uses linear logistic models for capture probabilities for individuals and capture occasions. 


HUGGINS, R.M. (1991). Some practical aspects of a conditional likelihood approach to capture 
experiments. Biometrics, 47, 725-732. 


e Uses linear logistic models for capture probabilities and exploits temporal order of captures to introduce 
dependence amongst captures and on measurable covariates for those captured at least once. 


ISAK], C.T. (1986). Bias of the dual system estimator and some alternatives. Communications in Statistics, 
Theory and Methods, 15, 1435-1450. 


e Exploits upper bound on correlation bias to reduce the bias of the dual system estimator. 


ISAKI, C.T., and SCHULTZ, L.K. (1986). Dual system estimation using demographic analysis data. 
Journal of Official Statistics, 2, 169-179. 


e Uses demographic analysis data to get revised dual system estimates for 1980 census using different 
models for correlation bias. 


ISAKI, C.T., and SCHULTZ, L.K. (1987). The effect of correlation and matching error in dual system 
estimation. Communications in Statistics, Theory and Methods, 16, 2405-2427. 


¢ Develops a simple matching error model in the presence of correlation bias to compare three dual system 
estimators. 


ISAKI, C.T. , SCHULTZ, L.K., DIFFENDAL, G.J., and HUANG, E.T. (1988). On estimating census 
undercount in small areas. Journal of Official Statistics, 4, 95-112. 


e Develops simulation populations based on 1980 census and coverage evaluation results, evaluates 
regression-based synthetic undercount estimation methods, and shows superiority of synthetic 
approaches to raw census counts. 


JABINE, T.B., and BERSHAD, M.A. (1968). Some comments on the Chandrasekar and Deming 
technique for the measurement of population change. Paper presented at CENTO Symposium on 
Demographic Statistics, Karachi, Pakistan. 


¢ Shows that positive correlation bias produces a downward bias in estimate of total population size. 


JARO, M. (1989). Advances in record-linkage methodology as applied to matching the 1985 Test Census 
of Tampa, Florida. Journal of the American Statistical Association, 84, 414-420. 


e Describes census methodology for matching census and post-enumeration survey records, with the 
results from their application to 1985 test census. 


JELINSKI, Z., and MORANDA, P.B. (1972). Software reliability research. In Statistical Computer 
Performance Evaluation, (Ed. W. Freiberger). New York: Academic Press, 465-484. 


¢ Proposes a model with exponentially distributed failures to estimate total number of program faults 
based on times of occurrence of failures in fixed time period. 


JEWELL, W.S. (1985). Bayesian estimation of undetected errors. In Bayesian Statistics 2, 
(Eds. J.M. Bernardo, ef a/.). New York: Elsevier, 663-671. 


JOLLY, G.M. (1965). Explicit estimates from capture-recapture data with both death and immigration - 
stochastic models. Biometrika, 52, 225-247. 


e Estimation from multiple-recapture data for open populations. 
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KADANE, J.B., MEYER, M.M., and TUKEY, J.W. (1992). Correlation bias in the presence of stratum 
heterogeneity. Submitted for publication. 


¢ Demonstrates the impact of correlation bias resulting from collapsing over heterogeneous strata with 
different catchability probabilities in each strata, subject to a monotonicity constraint. 


KROTKI, K.J. (Ed.) (1978). Developments in Dual System Estimation of Population Size and Growth. 
Edmonton: University of Alberta Press. 


e Reviews the use of dual system estimation for vital records in various countries. Includes technical 
details on the use of complex samples and elaborations on basic techniques. 


LASKA, E.M., MEISNER, M., and SIEGEL, C. (1988). Estimating the size of a population from a 
single sample. Biometrics, 44, 461-472. Correction, (1989), 45, 1347. 


e Estimates population size from the last of k lists. 


LEWIS, C.E., and HASSANEIN, K.M. (1969). The relative effectiveness of different approaches to 
the surveillance of infection among hospitalized patients. Medical Care, 8, 379-384. 


e Applies dual system estimation to surveillance of infectious diseases. 


LINCOLN, F.C. (1930). Calculating waterfowl abundance on the basis of banding returns. Circular of 
the U.S. Department of Agriculture, 118, 1-4. 


e Applies capture-recapture method to estimating size of waterfowl populations. 


MANTEL, N. (1951). Evaluation of a class of diagnostic tests. Biometrics, 7, 240-246. 


e Shows how heterogeneity induces correlation bias (event correlation) in the estimation of disease 
prevalence. 


MARKS, E.S., SELTZER, W., and KROTKI, K.J. (1974). Population Growth Estimation: A Handbook 
of Vital Statistics Measurement. New York: Population Council. 


¢ Comprehensive review of dual-systems estimation, assumptions, background, design, and problems. 
Contains claim that the basic method has been used for more than three centuries for estimating size 
of animal populations. 


MAXIM, L.D., HARRINGTON, L., and KENNEDY, M. (1981). A capture-recapture approach for 
estimation of detection probabilities in aerial surveys. Photogrammetric Engineering and Remote 
Sensing, 47, 779-788. 


MULRY, M. H., and SPENCER, B.D. (1988). Total error in the dual system estimator: the 1986 census 
of Central Los Angeles County. Survey Methodology, 14, 241-263. 


e Develops a total error model for dual systems approach applied to Los Angeles Test of Adjustment 
Related Operations. 


MULRY, M. H., and SPENCER, B.D. (1991). Total error in PES estimates of population (with discussion). 
Journal of the American Statistical Association, 86, 839-863. 


e Extends earlier Mulry-Spencer development of total error model for dual systems approach and applies 
it to 1988 dress rehearsal census in St. Louis and east-central Missouri. 


NICHOLS, J.D., and POLLOCK, K.H. (1983). Estimating taxonomic diversity, extinction rates, and 
speciation rates from fossil data using capture-recapture models. Paleobiology, 9, 150-163. 


OTIS, D.L. BURNHAM, K.P, WHITE, G.C., and ANDERSON, D.R. (1978). Statistical inference from 
capture data on closed animal populations. Wildlife Monograph, 62, Washington, DC: Wildlife Society. 


e Reviews capture-recapture and related methods for wildlife populations. 
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PERKINS, W.M., and JONES, C.D. (1965). Matching for census coverage checks. Paper presented at 
the Meetings of the American Statistical Association, Philadelphia. 


PETERSON, C.G.J. (1896). The yearly immigration of young plaice into the Limfjord from the German 
Sea. Report of the Danish Biological Station to the Ministry of Fisheries, 6, 1-48. 


¢ Classic development of method of capture-recapture and its application to the estimation of the size 
of fish populations. 


POLLACK, E:S. (1965). Use of census matching for study of psychiatric admission rates. Proceedings 
of the Social Statistics Section, American Statistical Association, 107-115. 


RAJ, D. (1977). On estimating the number of vital events in demographic surveys. Journal of the American 
Statistical Association, 72, 377-381. 


e Develops a formula for bias in dual system estimator under a general model of response errors and 
explores use of double sampling to correct bias. 


ROSSMO, D.K., and ROUTLEDGE, R. (1990). Estimating the size of criminal populations. Journal 
of Quantitative Criminology, 6, 293-314. 


RUBIN, D.B. , SCHAFER, J.L., and SCHENKER, N. (1988). Imputation strategies for missing values 
in post-enumeration surveys. Survey Methodology, 14, 209-221. 


e Presents a methodology matching for undercount estimation which utilizes an imputation approach 
rooted in loglinear models in the presence of missing data. 


SANATHANAN, L.P. (1972a). Estimating the size of a multinomial population. Annals of Mathematical 
Statistics, 43, 142-152. 


e Demonstrates asymptotic equivalence of conditional and unconditional estimators for the population 
size. 


SANATHANAN, L.P. (1972b). Models and estimation methods in visual scanning experiments. 
Technometrics, 14, 813-829. 


¢ Develops latent model to estimate the number of particles in scanning records which allows for 
differential detectability and induces dependencies amongst detectors. 


SANATHANAN, L.P. (1973). A comparison of some models in visual scanning experiments. 
Technometrics, 15, 67-78. 


e Applies traditional capture recapture model and latent models to data from actual visual scanning 
experiments. 


SANDLAND, R.L., and CORMACK, R.M. (1984). Statistical inference for Poisson and multinomial 
models for capture recapture experiments. Biometrika, 71, 27-33. 


e Shows relationship between the asymptotic variances of the population size under general capture- 
recapture model for the two alternate sampling schemes. 


SCHENKER, N. (1988). Handling missing data in coverage estimation with application to the 1986 Test 
of Adjustment Related Operations. Survey Methodology, 14, 87-98. 


e Examines effect of missing data on dual system estimate applied to test census data. 


SCHIRM, A.L., and PRESTON, S.H. (1987). Census undercount adjustment and the quality of 
geographic population distributions (with discussion). Journal of the American Statistical Association, 
82, 965-990. 

¢ Applies synthetic estimation approaches to 1980 census data to evaluate the impact of undercount 

estimation. 


Survey Methodology, June 1992 153 


SCHNABEL, Z.E. (1938). The estimation of the total fish population of a lake. American Mathematical 
Monthly, 45, 348-352. 


e Extends the basic capture-recapture approach to multiple recaptures, with information at each recapture 
on whether individuals were captured previously. 


SCOTT, C. (1974). The dual record (PGE) system for vital rate measurement, some suggestions for further 
development. In /nternational Population Conference, Liege, 1973, (Volume 2). Liege: International 
Union for the Scientific Study of Population. 


SEBER, G.A.F. (1965). A note on the multiple-recapture census. Biometrika, 52, 249-259. 
e Estimation from multiple-recapture data for open populations with time-specific parameters. 


SEBER, G.A.F. (1973). The estimation of Animal Abundance and Related Parameters. New York: 
Hafner. Second edition (1982). New York: Macmillan. 


¢ Contains an up-to-date review of capture-recapture techniques and their extensions for animal popula- 
tions, with emphasis on applications. 


SEBER, G.A.F. (1982). Capture-recapture methods. In Encyclopedia of Statistical Sciences, (Volume 1), 
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A Variation of the Housing Unit Method for Estimating 
the Population of Small, Rural Areas: 
A Case Study of the Local Expert Procedure 


LINDA K. ROE, JOHN F. CARLSON and DAVID A. SWANSON! 


ABSTRACT 


This paper examines the suitability of a survey-based procedure for estimating populations in small, rural 
areas. The procedure is a variation of the Housing Unit Method. It employs the use of local experts enlisted 
to provide information about the demographic characteristics of households randomly selected from 
residential unit sample frames developed from utility records. The procedure is nonintrusive and less 
costly than traditional survey data collection efforts. Because the procedure is based on random sampling, 
confidence intervals can be constructed around the population estimated by the technique. The results 
of a case study are provided in which the total population is estimated for three unincorporated com- 
munities in rural, southern Nevada. 


KEY WORDS: Survey-based; Utility records; Confidence intervals; Nevada. 


1. INTRODUCTION 


In its most recent survey of state and local agencies preparing population and housing 
estimates, the U.S. Bureau of the Census found that about 89 percent of the agencies surveyed 
use the Housing Unit Method (HUM) (Byerly 1990). This method was also found to be widely 
used in an earlier survey (U.S. Bureau of the Census 1978). The method has been found to 
provide accurate estimates of the total population (Lowe, Pittenger and Walker 1977; Lowe, 
Weisser and Myers 1984; Smith and Lewis 1980, 1983; Smith and Mandell 1984) as well asa 
strong conceptual and practical foundation for a municipal estimation system (Martin and 
Serow 1979; Rives and Serow 1984; Swanson, Baker and Van Patten 1983). 

One of the strong features of the HUM is that it can be implemented in a variety of forms, 
which allows it to be adapted to a range of data environments (Swanson, Baker and Van Patten 
1983). This adaptability has been exploited primarily by subnational demographic centers for 
purposes of revenue sharing and related programs (Martin and Serow 1978; Swanson, Baker 
and Van Patten 1983). However, as pointed out by Rives (1982), the method has potential uses 
in other arenas. 

As an example, consider the case of environmental impact statements. Concerns over legal 
and environmental issues have resulted in decisions to locate unpopular facilities in sparsely 
populated rural areas for which census and other socioeconomic data are usually not available 
(Freudenburg 1982; Brown, Geertsen and Krannich 1989; Munsell 1988). As a consequence, 
it has become necessary to develop methods of inquiry, particularly suited for small, rural areas, 
that fully exploit available data, are less costly and, in many cases, less intrusive, than area, 
telephone, and mail surveys. We believe that the variation of the HUM that we propose in this 
paper contributes to this type of methodological development. 


! Linda K. Roe and John F. Carlson, Science Applications International Corporation, 101 Convention Center Drive, 
Las Vegas, Nevada 89109; David A. Swanson, Center for Social Research and Department of Sociology, Pacific 
Lutheran University, Tacoma, Washington 98447-0003, USA. 
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The HUM variation that we describe in this paper combines two methods that are, in 
themselves, well known. However, they have largely been developed in isolation from each 
other, as well as from the HUM. These are: (1) random sampling; and (2) ‘‘local expert”’ 
interviews. As discussed later, these methods, combined with the HUM may lead to a means 
of obtaining the population size and, eventually, composition data required to meet the infor- 
mation needs of impact assessment projects and other activities affecting small, rural areas. 


2. CONSIDERATIONS IN ASSESSING IMPACTS IN SMALL, 
RURAL AREAS 


The location of new plants or industries in rural areas generally requires a work force 
exceeding that which is available in the local area. Population growth in the communities that 
are in close proximity to the site can be expected to vary according to the size of the project 
and the number of employees that will be hired to build, then to operate and maintain the com- 
pleted facility. Whether rapid increases in the overall number of individuals are expected, or 
significant changes in the age and sex distribution, the altered population structure will have 
an effect on the type and amount of public services needed (Summers 1982). Thus, impact 
assessments require information regarding anticipated increases in school enrollment, housing 
requirements, health care needs, and other services. Before such projections can be made, how- 
ever, information on the current population in the impacted area must be determined in order 
to have a ‘‘jump-off’’ or ‘‘launch’’ population for forecasting purposes (Carlson, Williams 
and Swanson 1990; Pittenger 1976; U.S. Department of Energy 1988). 

The understanding of major factors affecting the distribution of people in isolated rural 
areas is critical in constructing demographic profiles and projections. These communities are 
likely to have been affected by periods of both boom growth and decline (Krannich and Greider 
1984). Historical patterns of population change, as well as current trends, may differ substan- 
tially from averages derived from that of the county as a whole or even other sub-county areas. 
This presents a special problem because accurate demographic information is usually available 
only for years in which the Federal Census is conducted. However, census data, including infor- 
mation on households, are not typically available for unincorporated places with small popula- 
tions. Since cost is usually a major factor, the possibility of conducting special censuses or large 
sample surveys, particularly on a regular basis, is often precluded, even in small, rural areas. 
An additional problem associated with such counts is that they require interviewers to contact 
individual households, which imposes on time and privacy and adds to disruption burdens that 
may be already high for local residents (Brown, Geertsen and Krannich 1989; Krannich, Berry 
and Greider 1989; Schleifer 1986). 

The estimation of the size of the current population of a small, rural area could, in prin- 
ciple, be accomplished through several techniques. However, data limitations and a desire for 
accuracy severely curtail the range of candidates and, realistically, point to a single technique: 
HUM (Smith 1986; Smith and Mandell 1984; Lowe, Weisser and Myers 1984; Swanson, Van 
Patten and Baker 1983; Smith and Lewis 1983, 1980). 


3. THE HOUSING UNIT METHOD 


The concept of the HUM relies on the fact that nearly everyone sleeps under some kind of 
shelter. The U.S. Bureau of the Census, for example, chooses to define two classes of shelters: 
group quarters; and housing units. All persons are assigned to one shelter class or the other. 
The HUM holds that these shelters can be identified, counted, and classified as occupied or 
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vacant. Also, all occupied shelters must have a specific number of occupants. Therefore, the 
population of any given place must be equal to the sum of the housing units times the occupancy 
rate times the average number of persons per occupied housing unit (household) plus the number 
of persons in group quarters. The four elements of the HUM provide an exact demographic 
identity, with the population of a given place given by 


Pe= (7) e.(O) * (PPA) wee GQ; 
where 


P = total population, 
H = total housing units, 
O = proportion of occupied units, 
PPH = mean number of persons per household, 
GQ = group quarters population. 


The key accuracy issue in using the HUM is in the determination of each of the components. 
Moreover, as Smith (1986, pp. 245-246) observes: 


“‘The Housing Unit Method is a robust, comprehensive, and extremely flex- 
ible form of population estimation with a number of characteristics that make 
it useful for small-area analysis. It is not confined to a single technique or 
type of data; rather, it can utilize a number of different techniques and data 
sources, including those that may be applicable in one area but not another.’’ 


As also noted by Smith (1986), there are two major approaches used to generate the ‘‘number 
of households’’ element of the HUM. One relies on measures of construction activity and the 
estimation of an occupancy rate; the other uses utility data, such as residential electrical 
customers. A major advantage of the second approach is that it can directly provide the number 
of households, which eliminates or substantially reduces a number of potential data inac- 
curacies, including the need to estimate time lags between when permits are issued and units 
are completed, completion rates, demolitions, conversions, and occupancy rates. Starsinic and 
Zitter (1968) as well as Rives and Serow (1984) find that the ‘‘utility data’’ approach to the 
HUM is advantageous, although they also acknowledge certain limitations. 

Another advantage of using utility data is that the same data used to obtain total households 
can also be used as a complete frame from which samples can be drawn in order to obtain an 
estimate of the average number of persons per household (PPH). There are three forms that 
traditional data collection usually take in obtaining this type of sample information: mail, 
telephone, and personal interview. We propose that in their place ‘‘local experts’’ be used to 
minimize both cost and disruption burdens. 


4. LOCAL EXPERTS 


The local expert procedure (also refered to as the key informant procedure) of obtaining 
information about a community is well-established in the field of cultural anthropology. It 
is generally acknowledged as a “‘reliance on a small number of knowledgeable participants, 
who observe and articulate social relationships for the researcher ’’ (Seidler 1974, p. 816). 
Further, Poggie (1972) finds that when the questions asked in the field relate to noncontrover- 
sial, concrete, and directly observable public phenomena, local experts are a highly reliable 
and precise source of information. 
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There are two key issues in using the local expert procedure in conjunction with utility records 
and the HUM. The first is to identify and recruit people who are truly local experts on the com- 
position of the households presented to them in the sample. The second is to be able to obtain 
household identifying information that is familiar to the local experts (e.g., a street address 
and the name of the householder instead of a utility company billing code). 


5. CASE STUDY 


The data collection activity on which our population estimates rely is part of a program to 
assess the socioeconomic characteristics of communities located near Yucca Mountain, Nevada, 
the proposed site of a geologic nuclear waste repository (U.S. Department of Energy 1988). 
The data will comprise part of the set used in a comprehensive impact analysis of the proposed 
repository. 

Yucca Mountain is located in Nye county, approximately 90 miles northwest of Las Vegas 
in a sparsely populated, desert area. The impact analysis is focused on the communities that 
are within a fifty mile radius of the Yucca Mountain site. The study areas includes the unin- 
corporated communities of Amargosa Valley, Beatty, and Pahrump in southern Nye county 
and Indian Springs in Clark county. Tax boundaries specified by the county commissioners 
are used to deliniate community boundaries for purposes of the impact analysis. 


6. DATA AND METHODS 


During a preliminary phase of the research, contacts were made with community leaders 
and residents. These contacts resulted in a network that later facilitated the collection of data. 
Field notes were taken describing the general layout of each community in the study area. These 
included the types and locations of businesses and residential areas. Four separate housing types 
were defined using the guidelines developed by the U.S. Bureau of the Census. 

Following the preliminary investigation, the road system and other features were mapped 
for each community. Using these maps and utility records, representatives of the electrical com- 
pany servicing southern Nye county identified the location and type of housing, if any, 
associated with all current electrical connections. This information was added to the housing 
unit file constructed from the utility records for each community. Because of the lack of ade- 
quate utility records for Indian Springs, housing information for this area was collected by 
a ‘‘windshield survey,’’ a systematic, block-by-block canvassing of housing units by teams 
operating from automobiles (Lowe, Pittenger and Walker 1977). As a consequence, Indian 
Springs is not included in the test results reported in this paper. 

The preliminary fieldwork indicated that substantial differences in PPH could be expected 
across the communities in the study area. Thus, a random selection of units from the housing 
unit file was drawn separately for each community, based on the number of housing units in 
each community. A conservative approach was used to determine the size of each community’s 
sample. It assumed a 5% margin of error, a significance level of .05 and interest in a 
dichotomous variable with a 50-50 distribution (Cochran 1977). Once the initial size was deter- 
mined, an additional 15% was added to allow for missing cases. The final sample size for 
Amargosa Valley was 175 housing units, for Beatty it was 222, and for Pahrump, 355. 

Local experts were initially identified through the contact network on the basis of their 
experience in community activities and their familiarity with local residents. Each potential 
expert was interviewed and asked to complete a form designed to assess their qualifications. 
A written explanation of the project and specific instructions regarding the data collection 
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process were provided and discussed. The persons selected as local experts were given instruc- 
tions regarding confidentiality. For this project, we found that the ‘‘meter readers”’ employed 
by the local utilities constituted a good source of local experts. The local experts were provided 
with the sample set of housing units for their respective communities. In most cases, two local 
experts worked together, which made it possible to verify the accuracy of information as it 
was recorded. For each unit, the local experts communicated to the researcher only the number 
of persons in the household as of July 15, 1990, the age (using eight age groups) and gender 
of each household member, and the retirement status of any member fifty years of age and 
over. If either of the two local experts was unsure about the composition of a given household, 
another member of the community was contacted to confirm the data. In the case where the 
composition of any part of the household could not be confirmed, ‘‘data unknown’? was 
recorded for the entire unit. The data were recorded on a form that listed and identified the 
sample units by an attribute number (designated according to location on the housing unit map), 
the electrical meter number assigned to the unit, and the type of housing unit. All residential 
units, including those identified as ‘‘burned down’’ or otherwise destroyed, unoccupied or 
“removed from pad”’ (in the case of mobile homes and trailers) are considered part of the final 
sample. Units identified as ‘‘not a residence’’ were eliminated from the frame and not included 
in the sample. There were a few units for which data were unknown. These units are not included 
in the final sample, which may cause some slight bias. 


7. RESULTS 


The first data product is the number of households, which is derived directly from the active 
meter records, screened and classified by utility company personnel. Table 1 displays these 
figures by community along with other results that are discussed later. 

Table 1 also provides the estimated PPH, which is taken from aggregate number of persons 
identified in the occupied sample units by the local experts. Also found in this table is the 
estimated household population of each community, which was found by applying the HUM 
formula to the household and PPH components. There were no group quarters identified in 
any of the communities. 


Table 1 


Sample Characteristics and Results of the Accuracy Test* 
Le a a ee ad Eh a eee 


95% 
Estimated 1990 1990 Confidence 

Community Households Census Interval 

—_ Count —_ 

PPE SE Population Low High 
Amargosa Valley 326 2.58 ali 841 853 771 911 
Beatty 672 2043 10 1,633 1,623 1,501 1,765 
Pahrump 3,224 PMO Re: .06 7,190 oleh 6,810 7,569 


* The Estimated data and confidence intervals are produced by the procedures described in the text. The 1990 census 
counts are taken from Table 3 in the ‘‘1990 Census Extract, Nevada, Public Law 94-171 Data,”’ dated February 11, 
1991 and distributed by Betty McNeal, Nevada State Data Center Librarian, Nevada State Library and Archives, 
Capitol Complex, Carson City, Nevada 89710. The count for the area ‘‘Amargosa Valley is made up of the 1990 
population reported for Nye county’s Amargosa Valley Division (761) and Crystal Division (92). The count for 
the area ‘‘Beatty”’ is taken from the Beatty Census Designated Place and the count for the area “‘Pahrump”’ is taken 
from the Pahrump Division of Nye county. 
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8. MEASURING UNCERTAINTY IN THE ESTIMATES 


One major advantage of estimates based on random sampling is that confidence intervals 
can be generated. Rives (1982) advocates this approach. However, he did not consider the use 
of local experts and believed that his suggestion would only be followed in exceptional 
circumstances because of the high expense associated with traditional surveys. This was also 
noted by Morrison (1982) and Lee and Goldsmith (1982) in their critical review of Rives’ 
suggestion. 

In the case of the local expert procedure, the ‘“‘statistic’’ is the PPH value, which in prac- 
tice would vary from sample to sample depending on the variation in PPH values. Our interest 
is less in the PPH values than in the estimate of population, however, so we use a simple 
transformation introduced by Espenshade and Tayman (1982) and used more recently by 
Swanson (1989) to place the confidence interval originally generated for a given community’s 
PPH value around each of the community population estimates. 


Let 
P = estimated household population, 
N = number of households, 
PPH = estimated persons per household. 
Then 
lower limit (P) = (N) * (PPH — (t,_2, a/2) * (se)), 
upper limit (P) = (N) * (PPH + (t,_2, a/2) * (se)), 
where 


n = number of households sampled, 
level of significance desired, 
se = standard error of the estimated PPH, 
t,—-2 = (a/2)100th percentile of the ¢ distribution,with (n — 2) degrees of freedom. 


II 


As an example, using a significance level of .05, the corresponding 95% confidence interval 
for the estimated 1990 population of Pahrump (7,190) is 


lower limit = 6,810 = (3,224) * (2.23 — (1.96 * .06)), 
upper limit = 7,569 = (3,224) * (2.23 + (1.96 * .06)). 


9. TEST OF ACCURACY 


Before turning to the test results, which are also included in Table 1, some data qualifica- 
tions require discussion. The single most problematic issue in terms of comparing the estimates 
with the 1990 census results lies in the fact that the Bureau of the Census does not recognize 
the ‘‘tax districts’? as administrative boundaries for the communities in the study area. This 

> ce 


means that the Bureau’s “‘statistical’’ geography must be used, which requires some adjustments 
so that the geography used for purposes of the impact analysis matches that used by the Bureau. 
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In terms of these adjustments, the area identified as Amargosa Valley for purposes of the 
impact study is known to vary from the Amargosa Valley Census Division of Nye county used 
by the Bureau in that the study’s definition includes the Crystal Census Division of Nye county. 
Fortunately, this is a case where two pieces of statistical geography used by the Bureau can 
be combined to virtually match that used in the impact study. Thus, the 1990 census population 
counts shown in Table 1 for the Amargosa Valley include the Crystal Division along with the 
Amargosa Valley Division. ‘‘Beatty’’ is another area that is known to vary in terms of 
geography. It is identified as both a Census Designated Place and as the Beatty Census Division 
of Nye county by the Bureau. In this situation, it is the Census Designated Place that cor- 
responds very closely to the definition of Beatty used in the impact study. Thus, the 1990 census 
population count for Beatty shown in Table 1 is for the Beatty CDP. 

The third community, Pahrump, is identified as a Census Division of Nye county. This piece 
of statistical geography used by the Bureau is virtually identical to that used in the impact study. 
Consequently, the 1990 census population found in Table 1 for Pahrump is that given for this 
division of Nye county. 

There are other differences between the estimates and the 1990 census figures. The official 
date of the census count is April Ist; the estimates are for July 15th. In terms of this difference, 
seasonal effects are believed to be very slight for the communities in question. With the exception 
of the outflow of some ‘‘snowbirds,’’ who may have been counted in the study area because 
they had no usual residence elsewhere, there were no known migration streams of any conse- 
quence between April and July. Similarly, the other components of population change were 
slight. 

Had the Bureau found transient persons with no usual residence elsewhere, the estimation 
procedure is likely to have missed them. These differences would also impact housing unit 
counts. If a transient person, identified as a resident for purposes of the decennial census, is 
found in a recreational vehicle it would be included in the community’s ‘‘other’’ housing stock 
by the Bureau. Such accommodations would not be included in the data derived from the 
residential electrical meter records. 

We believe, however, that such instances are rare and, further, that the test results are not 
confounded by comparing a household population with a population that resides mainly in 
households but also, to some extent, in group quarters. 

The results of the test of accuracy are also summarized in Table 1, along with the ‘“‘low”’ 
and “‘high’’ estimates corresponding to the 95 percent confidence interval placed around each 
community’s estimated population. The estimated population is very close to the population 
reported by the Bureau. Overall, the mean absolute difference is 86 persons and the mean 
absolute percent difference is 1.7. 

The three confidence intervals contain the 1990 census population in each of the three com- 
munities, respectively. This finding is of special interest because the intervals are relatively 
narrow for a 95 percent level of confidence. On average, the width, as measured from the 
estimated population to either boundary is 7.2 percent of the estimated population. This 
suggests that confidence intervals constructed around the estimates derived from this varia- 
tion of the HUM are meaningful, even in the presence of some unknown level of nonsampling 
error. 

Two of the three communities are underestimated. In the case of Pahrump, it appears that 
the estimation technique was not able to capture all of the recent growth that appears to be 
spilling from the Las Vegas Valley into the Pahrump area. It is not known to what extent this 
was due to missing households on the frame and what was due to underestimating Pahrump’s 
PPH value. 
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10. SUMMARY 


While the local expert procedure may not provide satisfactory population estimates in all 
small, rural areas (e.g., vacation spots, with a high incidence of seasonal housing units and 
privately owned rental units), it appears to hold promise based on the data for the area included 
in this study. As with any estimation technique, the key criteria for determining if it could be 
implemented elsewhere revolve around the possibility of obtaining the required data and 
implementing the procedure within available funding. In the case of the local expert procedure, 
this would mean that utility data can provide the number of households and be used as a sample 
frame. Once a sample was selected, the procedure’s effectiveness would depend on the recruit- 
ment and knowledge of local experts. If these criteria can be met, the procedure would seem 
to be feasible. The next step would be to determine how accurate it is in a given application. 

We were not able to evaluate the accuracy of the age and other composition data estimated 
through the procedure at the time of this writing because these data were not yet available from 
the 1990 decennial census. However, we are encouraged by the test results for the total popula- 
tion, which indicate that the procedure has the potential for highly accurate estimates, even 
in small, rural areas experiencing rapid change. 
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Single Stage Cluster Sampling in Prevalence-Incidence Surveys: 
Some Issues Suggested by the Shanghai Survey 
of Alzheimer’s Disease and Dementia 


ZHISEN XIA, PAUL S. LEVY, ELENA S.H. YU, ZHENGYU WANG, 
and MINGYUAN ZHANG! 


ABSTRACT 


The scenario considered here is that of a sample survey having the following two major objectives: 
(1) identification for future follow up studies of n* subjects in each of H subdomains, and (2) estimation 
as of this time of conduct of the survey of the level of some characteristic in each of these subdomains. 
An additional constraint imposed here is that the sample design is restricted to single stage cluster sampling. 
A variation of single stage cluster sampling called telescopic single stage cluster sampling (TSSCS) had 
been proposed in an earlier paper (Levy et al. 1989) as a cost effective method of identifying n* individuals 
in each sub domain and, in this article, we investigate the statistical properties of TSSCS in crossectional 
estimation of the level of a population characteristic. In particular, TSSCS is compared to ordinary single 
stage cluster sampling (OSSCS) with respect to the reliability of estimates at fixed cost. Motivation for 
this investigation comes from problems faced during the statistical design of the Shanghai Survey of 
Alzheimer’s Disease and Dementia (SSADD), an epidemiological study of the prevalence and incidence 
of Alzheimer’s disease and dementia. 


KEY WORDS: Single stage cluster sampling; Prevalence estimation; Telescopic single stage cluster 
sampling; Alzheimer’s disease; Dementia. 


1. BACKGROUND AND INTRODUCTION 


Many studies have both a crossectional component in which the levels of quantitative 
variables or prevalences of dichotomous variables are estimated by means of a sample survey, 
and a longitudinal component in which a cohort of individuals is identified by means of the 
same sample survey and followed over a defined period for subsequent events. This type of 
study is especially common in the field of epidemiology in which estimates of the prevalence 
of a disease or condition are required both for the study population as a whole as well as for 
defined subgroups of it, and a sufficient number of individuals initially free of the disease or 
condition need to be identified within each of the defined subgroups for future estimation of 
the incidence of the disease or condition (cf. Kannel 1966). 

Design of a cost efficient sampling plan for such studies poses a challenge since sufficient 
numbers of individuals within each domain must be selected, often under some type of cluster 
sampling scheme, to ensure reliable estimation of both the prevalence and incidences discussed 
above. In this report, which has been motivated by a recent study conducted in China, we discuss 
these issues of sample design under a particular type of cluster sampling (single stage cluster 
sampling). 


! Zhisen Xia, Paul S. Levy and Zhengyu Wang, School of Public Health, University of Illinois at Chicago, P.O. Box 
6998, Chicago IL. 60680, U.S.A., ElenaS.H. Yu, School of Public Health, San Diego State University, San Diego, 
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2. STATISTICAL FORMULATION 


Let us suppose that a population consists of N individuals divided into H mutually exclusive 
subdomains, each containing N, individuals (h = 1, ..., H). Suppose further that the 
population is grouped into M clusters which will comprise the sampling units for the survey. 
Let us assume that sampling of the clusters will be according to ordinary single-stage cluster 
sampling (/.e., simple random sampling of clusters followed by selection of all individuals within 
each sample cluster.) 

If we wish to identify with 100 x (1 — a)% confidence at least nj individuals in a partic- 
ular domain, /, then the following number, ™m;,, of clusters must be selected (cf. Levy et al. 


1989): 
* Yy12 
mj = jan + (44+) | (1) 
h 


where 


Np; = the number of individuals in domain h, cluster 7, (i = 1, ..., M), 


Viv = On»|Nas 


M 
oN, = dy (Nai — Na)?/(M -1), 


— 
ll 
ae, 


Z, = the 100a’th percentile of the normal distribution 
and 


Aj) = | J) x Vn, /2- 


The above assumes that the N,; are normally distributed over the M clusters. Also, the 
number, 7}, of individuals needed in domain h/ is based on statistical considerations relevant 
to the longitudinal component of the study. For example, it could be based on the expected 
occurrence rate of the event of interest in the follow up period and the precision required for 
the estimate of this occurrence rate. 

If one also wishes to estimate with 100 x (1 — a)% confidence the total or mean level of 
some variable 3C to within 100 x €% of its true value for each domain, /, then one would 
require sampling of the following number, M;’, of clusters in domain h; 


ee we Zi—a/2 M Vix 9 
ie = 2 y2 Reet >? ( ) 
40/21 hx A ye 
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where, 


Xj;; = the level of variable 5 for individual j within domain h of cluster i 
(J — iv; 2 /Vai3t = ils a ey clVL) 
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Xai = Xhij> 
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and 


2 2,72 
Vix = Thy/Xh. 


For both of the specifications stated above to be satisfied within each domain, it follows that 
we would require m,, clusters to be sampled where for A = 1, ..., H, 


Mp, = Max(M;, Mj). (3) 


Without loss of generality, we can relabel the domains in order of increasing required m, 
(Ve. Ti = Is Ss. 4S Mizz). 

Finally, in order for both of the specifications to be satisfied in each of the H domains under 
an ordinary single stage cluster sampling design, the number, m, of clusters required to be 
sampled would be my. We note again that in ordinary single stage cluster sampling, every indi- 
vidual in every sample cluster is sampled. Thus, while the specifications of sample size are met 
minimally in domain H, the domain requiring the largest number of sample clusters, they are 
more than met in the other domains: 1, ..., H — 1. This inclusion in domains other than H 
of more individuals than are actually required could result in a survey that has overly expen- 
sive field costs. 

The alternative to ordinary single-stage cluster sampling that is generally used to avoid this 
needless expense would be a two-stage cluster sampling design with different second stage 
sampling fractions (i.e., over sampling) in each domain. Given, however, a scenario in which 
it is not feasible to subsample at all within clusters, a methodology called single stage telescopic 
cluster sampling (SSTCS) was proposed in an earlier publication (Levy et al. 1989) which 
allowed the eligibility rule (i.e., the rule that determines which individuals are eligible for inclu- 
sion in the sample) to vary over the sample clusters. In this design, the particular domains 
included in the sample would not be the same for each sample cluster. This earlier publication 
demonstrated the usefulness of single stage telescopic sampling in surveys which have as major 
objective the identification for future longitudinal follow up of a certain number of individuals 
in each of several domains. In this report, we will characterize the properties of estimates 
from this type of design and compare them to estimates from ordinary single-stage cluster 
sampling. 
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3. TELESCOPIC SINGLE-STAGE CLUSTER SAMPLING 


3.1 Sampling of Clusters 


As mentioned above, single-stage telescopic cluster sampling is proposed as a cost saving 
alternative to ordinary single-stage cluster sampling in situations where it is not feasible to sub- 
sample within sample clusters, and is performed as follows. If there are H mutually exclusive 
and exhaustive domains for which estimates are desired, and if m clusters are to be sampled, 
the m sample clusters are divided randomly into mf type 1 clusters, m3 type 2 clusters, ..., 
and m7, type H clusters having the following properties: A type / cluster (h = 1, ..., 1) as 
illustrated below has as eligible sample persons individuals in domainsh,h + 1, ..., H, but 
not in domains h’ where h’ < h. 


Cluster Domains Sampled 
Type 1 2 h H 
1 + + + + 
2 ~ + + + 
h — - 4 + 
Ey - _ -~ - 
“*+?? = domain sampled ‘*_?? = domain not sampled. 


The term telescopic was suggested by the appearance of the above diagram. 


The number, mj, of type A clusters is generally determined according to the following 
strategies: Suppose that under single-stage cluster sampling, a sample of m,, clusters as deter- 


mined by relation (3) is required for domain h, (h = 1, ..., H); and, again supposing that 
mM, SM)... S My, we would let: 
iy — Mi; and “ii, — Nh, — Nate 10l- = 2,.. 20, 4, 


Clearly, this allocation results in a total of m,, sample clusters being selected, with elements 
in each domain, A, being sampled in m, sample clusters, exactly the number of clusters 
required to achieve the specifications placed on the reliability of estimates and the identifica- 
tion of individuals for future follow up. As discussed above, if ordinary single-stage cluster 
sampling (OSSCS) were used, a sample of ™,;, clusters would be needed to meet specifications 
in domain H, but this would entail individuals in the other domains also being sampled in my 
clusters in excess of that needed to meet the stated specifications. 


3.2 Characterization of Estimates 


Let 


M 
Cnkx = > (Xni — Xn) (Xu; — X,)/M, 
= 


Se = fii, Paces imp} = the set of sample clusters having eligible persons in domain h. 
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The following results can then be obtained from combinatorial theory. 


1. The estimated total, x;.;, under TSSCS of a population total X is given by 


H 
Xia = YD xh, (4) 
h=1 
where x; is given by 


xh = (Mim) YX. 
i¢Sp 


2. The mean, F(x;-)), and variance, Var(x;.1), of x are given by 


E (Xie) = X, (5) 


FM Mec 
Var (X71) = yy aoe Cae @ 2 > ons) (6) 
; © 


h=1 k<h 


These relationships follow in a straightforward way from combinatorial theory. 


4. COST COMPARISONS BETWEEN OSSCS AND TSSCS 


We can examine the comparative costs of OSSCS vs. TSSS by considering the following 
simple cost function that would be associated with OSSCS: 


H 
Co = Cimy AF Comy(N, AF Ny es rt: Ny) = m(C ar Cr Di ®,) (7) 
A= 


where C> is the expected cost, C; is the cost component associated with clusters (e.g., travel 
to and from cluster, procurement of the list of enumeration units in the cluster, preparation 
of materials for field work within the cluster, etc.) and C) is the cost component associated 
with listing units (primarily travel between listing units and interviewing). It should also be 
noted that the expression, Y /_, N,, is the average number of listing units per cluster. Again, 
throughout this discussion the listing units are the individuals themselves. The analogous 
expected cost, C;, associated with telescopic sampling would then be given by: 


A 
C; = Cymy AF C(m,N, + m,N, +... myNz) = mi ( Ci PF C> S WR) (8) 
h=T 


where Y;, = m,/my (which is < 1). Thus, the cost, C,, associated with TSSCS is less than or 
equal to that associated with an OSSCS of the same number of clusters with the difference 
being equal to 
H 
Cyny ye Creag NG 
h=1 
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The most important comparison between the two sample designs, in many instances, would 
be that involving their performance at equivalent cost in estimating the overall level, X, of a 
characteristic, #C. An estimator, x{,g, based on an OSSCS of m,, clusters (the number required 
to meet the specifications within each domain) would have variance given by: 


2 F H 
Var (Xora) — =. Cae De @ + 2 ss on) : (9) 
H = 


h=1 k<h 


This is not the usual form of the variance (cf. Levy and Lemeshow 1991, chapter 9), but is 
an algebraically equivalent form that can be compared directly with the variance of x;., based 
on a TSSCS design with my clusters sampled (equation (6)). The difference between these two 
variances is given by 


Ne Se e/a 
Var(xie) — Var (X¢ra) = (ase) \P. @ eS oa) (10) 


h=1 k<h 


which is greater than or equal to zero (0). This is not surprising since an OSSCS of m,, clusters 
will invariably result in a larger overall sample size than a TSSCS of the same number of clusters. 


Although an OSSCS of my, clusters will result in an estimator, x{,q, which has a lower 
variance than the estimator, x;.,, resulting from a TSSCS of the same number, my, of clusters, 
it does so at a higher cost. For this reason, it is more reasonable to compare x;., based on a 
sample of my clusters to x{,q based on a sample of m™* clusters where m™* is the number of 
clusters that can be sampled from an OSSCS design at cost equivalent to that based ona TSSCS 
design having m,, sample clusters. From equations (7) and (8), it follows that m”* is given by: 


H 
ro . 
Eee ap rR 


te ee (11) 


It should be noted that 
(1) m* < my. 


(2) As C)/C,; — o, then m* — m, 
H H 
where, mn, = ? ms ys N,- 


(3) As C,\C; — 0, then m* — my 
and 


(4) m* decreases monotonically with increase in C,/C, which implies that m,, < m* < my. 
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From the above analysis, we note that at a cost equivalent to that of a TSSCS of my 
clusters, the variance of x4,4 (ignoring the finite population correction) will be inflated by at 
most a factor equal to mz/m,, over that which would have been obtained from an OSSCS of 
my, Clusters, where ™,, is a weighted mean of the m, clusters required within each domain for 
the domain specific specifications to be met. The weights in this instance are the N,, which 
are the average number of individuals within each particular domain. It should be noted also 
that the reduction in effective sample size of an OSSCS equivalent in cost to a TSSCS increases 
with increase in C,/C,, which is essentially the ratio of the cost of extracting information from 
sample individuals to that of preparing the sample clusters for the survey. This makes sense 
intuitively. 

The issues discussed above are illustrated in the next section with data from the Shanghai 
Survey of Alzheimer’s Disease and Dementia. 


5. THE SHANGHAI SURVEY OF ALZHEIMER’S 
DISEASE AND DEMENTIA 


The SSADD was planned in 1986 having as major objectives: (1) estimation of the prevalence 
of physical and mental impairments including Alzheimer’s and other dementing diseases among 
persons in each of three age groups (55-64 yrs/65-74 years/ and 75 yrs. and older) in the Jing- 
An district of Shanghai, China, and (2) identification of approximately 1,400 persons in each 
of these 3 age groups for future determination of the incidence of these conditions. Jing-An, 
is one of twelve districts comprising the city of Shanghai, and was chosen as the target area 
because of its relatively large and stable population of elderly and its proximity to the Shanghai 
Institute of Mental Health which was responsible for the field work. Findings from this study 
have been discussed by Zhang et al. (1990) and by Yu et al. (1989). Methodological issues have 
been discussed by Levy ef al. (1988 and 1989). 

The clusters in this survey are administrative entities called neighborhood groups consisting 
of geographically contiguous households having a well defined social and political structure. 
The strategy was to involve the leaders of neighborhood groups selected in the sample in the 
identification and recruitment of eligible persons. At the time of the planning of the survey, 
there were 4,066 neighborhood groups within the Jing-An District. This particular population 
of aging and elderly Chinese generally had a low level of education and had experienced in 
their lifetimes repeated periods of political upheaval and repression (e.g., the Warlords, the 
Japanese invasion, the Cultural Revolution), where being singled out or selected often had 
adverse consequences. For these reasons, it was felt strongly, especially by the local Chinese 
members of the research team who were most familiar with the target population, that any 
attempt to subsample persons in the target age groups within neighborhood groups that fall 
into the sample would compromise response rates and overall cooperation. 

Restricted to single stage cluster sampling and faced with a very tight deadline for designing 
the sample, the member of the study team responsible for the sample design (PSL) proposed 
a heuristic method that would result with reasonable certainty in the identification of 1,400 
individuals within each of the three target age groups. The resulting design was essentially a 
TSSCS in which 446 neighborhood groups were sampled. For details of this design, the reader 
is referred to the publications on the SSADD cited above. It should be emphasized that the 
resulting design was chosen purely on heuristic grounds and long before the theory behind this 
methodology was developed. 
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Of the 446 neighborhood groups sampled, 149 were designated as type 1, and 136 of these 
contained at least 1 person in the target age group (55 years and above). Since only the type 1 
clusters have as eligible respondents all persons in each of the 3 target age groups, they can 
be used to estimate all of the parameters needed to evaluate the cost effectiveness of TSSCS 
relative to OSSCS. In the ensuing discussion, we will use the data from these 136 clusters to 
illustrate numerically how, on the basis of available ‘‘pilot’’ data, comparisons can be made 
between OSSCS and TSSCS with respect to cost effectiveness. From this sample of 136 clusters, 
we have for each domain, h, estimates of relevant parameters as shown below: 


Age Ni VNp Xp Vix ae 
k<h 
55-64 10.985 485 A125 Pees we 0.000 
65-74 8.088 (obs .360 pe eM 0.190 
(ei 3.478 .643 .456 1.665 0.296. 


If we wish to identify with 95% confidence at least 1,400 persons in each age group, then from 
relation (1) and the data shown above, we would have 


A, = 1.645 x 0.485/2 = 0.3989 


and 


,  1,400\ 7? 
m{ = | 0.3989 + ( (0.3989)? + ~— = 136.78 = 137. 
10.985 


Similarly, mj ~ 185, and mz = 419. 


Let us suppose that for each of the three age groups, we wish to estimate with 80% con- 
fidence to within 30% of its true value the proportion, X;,,, of persons showing evidence of 
cognitive dysfunction as judged by a score below 18 on the Mini Mental State Examination 
(MMSEB), which is a screening test for cognitive dysfunction. From these same data, we have 
the following estimates of the parameters necessary to determine the number of sample clusters 
required to meet this specification: 


From relation (2), with M = 4,066, « = 0.30, and z,;_q/2 = 1.28, we have the following values 
of my: 


m= 1ST; mz >= 99; mz; = 30 


and from relation (3), the number, m,, of clusters required to satisfy both conditions in each 
domain is given by: 


m, = max(137,157) = 157; m2 = max(185,99) =°185; =m; = max(419,50) =:419, 


Thus, for an OSSCS design to satisfy both specifications, the number, m, of clusters required 
to be sampled would be 419. Likewise, a TSSCS design having 157 type 1, 28 type 2, and 234 
type 3 sample clusters would satisfy both requirements. 
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The cost components, C, and C), expressed in person hours, are estimated to be 20 and 2 
respectively. The relatively high cost component, C,, associated with clusters is due to the fact 
that once a neighborhood group is selected in the sample, many hours must be spent obtaining 
the list of households and persons from a central bureau and enlisting the support of the 
neighborhood group leaders. The cost component, C2, of 2 person hours associated with 
individuals involves primarily interview and call-back activities. Thus, the field costs, Co, 
associated with an OSSCS design that satisfies both specifications is (from relation (7)) 27,278 
person hours as compared to a cost of 17,737 person hours (from relation (8)) associated with 
a TSSCS design that satisfies both specifications. This represents a 35% savings in field costs, 
which is substantial. 

From relation (9), we calculate that the estimate, xJ,4, of the number of persons over all 
3 age groups having evidence of a cognitive disorder based on an OSSCS of 419 sample clusters 
would have variance equal to 70,844, whereas x;., the analogous estimate based on a TSSCS 
also with 419 clusters, would have variance equal to 122,744, which is 42% greater than the 
variance of the OSSCS estimate. However, an OSSCS design having the same field costs as 
a TSSCS design based on 419 sample clusters would permit only 208 clusters to be sampled 
(relation (11)). The variance of xj,4 based on an OSSCS design with 208 sample clusters would 
be estimated to be 141,733, which is 15% higher than the variance of the analogous TSSCS 
estimate having the same field cost. Also, the OSSCS design having 208 sample clusters would 
not satisfy the two specifications placed on the estimates. 


6. DISCUSSION 


The methodology, TSSCS, discussed here and in earlier publications, arose from a situa- 
tion in which cluster sampling was clearly indicated but a definite ‘‘red light’’ was given to 
any subsampling within clusters. For the Shanghai Survey of Alzheimer’s Disease and Dementia 
considered here, the two major objectives were to identify a certain number of individuals within 
each of 3 domains (age groups in this instance) and to obtain domain specific estimates meeting 
certain specifications pertaining to precision. Based on results presented above for this par- 
ticular survey, it appears that this method could result in considerable savings in field costs 
without compromising objectives. 

One might raise questions concerning the general applicability of this methodology. It would 
be of use primarily in situations where it is either not feasible or too costly to subsample clusters 
and the individuals do not have to be screened to determine whether they belong to one of the 
target domains (in the SSADD, the leadership of the sample neighborhood groups provided 
a list of all persons in the neighborhood group along with information on data of birth). Such 
scenarios may occur, for example, in surveys where data are abstracted from records by per- 
sonnel sufficiently familiar with the records to abstract information, but not considered capable 
of sampling the records without expensive supervision. Again, in such situations, TSSCS may 
provide a reasonable alternative. 
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Figure 2. ET des facteurs de redressement par strate de second niveau. 
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Figure 1. Facteurs de redressement par strate de second niveau. 
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In This Issue 


In August of 1991 a symposium in honour of Professor V.P. Godambe on the occasion of 
his 65th birthday was held at the University of Waterloo. Papers presented at this symposium 
were in the areas of foundations of inference, theory of estimation, and theory of survey sampling, 
all areas in which Professor Godambe has an interest and to which he has made significant 
contributions. The special section Inference with Survey Data in this issue, which is dedicated 
to Professor Godambe, contains some of the sampling related papers from the symposium. As 
a group these papers discuss many important issues for inference with survey data such as the 
role of modelling, robustness, complex survey designs, resampling methods, and the effects of 
imputation. 

Royall considers model based estimation for finite population parameters. He describes the 
conflict between designs which provide model efficiency and those which are robust to model 
failure. Robustness is achieved through balanced samples. He presents a class of models for which 
the optimal sample is already balanced so that, for models in that class, there is no conflict between 
robustness and efficiency. 

Smith and Njenga discuss model based and randomization based inference for sample surveys 
and suggest a robust non-parametric modelling approach to inference. Based on simulations using 
both real and synthetic data, they conclude that their estimator of a regression coefficient is robust 
to violations of assumptions of linearity and homoscedasticity, has good efficiency, and has 
reasonable conditional and unconditional properties. 

Rao, Wu, and Yue review recent developments in resampling methods for complex survey 
designs, particularly the jackknife, balanced repeated replication, and the bootstrap. In a simula- 
tion study using a synthetic population they evaluate and compare variance estimators and 
confidence intervals for the population median. 

Mantel considers model assisted estimation of a finite population mean based on a sample 
survey. He suggests that models should be extended so that the finite population mean is a known 
function of the optimal census based estimate of a model parameter. The extended model is then 
a compromise between model efficiency and finite population relevance. 

Krieger and Pfeffermann discuss maximum likelihood estimation of model parameters. They 
describe various approaches in the literature and consider the problem of informative designs. 
They propose the use of weighted distributions where the weights are modelled as functions of 
the covariates and of the variable of interest. The approach performs reasonably well in a 
simulation study. 

In the final paper of this special section Sarndal considers the problem of variance estimation 
when imputation is used to complete a data set. Overall variance is derived as the sum of a 
sampling variance and an imputation variance. The suggested variance estimator is a design 
based estimator of the sampling variance with a model based correction for bias and a model 
based estimator of the imputation variance. Some examples and an empirical evaluation are 
presented. 

Armstrong and Wu formulate the problem of sample allocation for a general two-phase 
survey design as a constrained programming problem. By exploiting its mathematical structure, 
they propose a solution that consists of iterations between two subproblems that are computa- 
tionally much simpler. They provide empirical results showing that the proposed method works 
very well. 


178 In This Issue 


Couper and Groves examine whether experienced interviewers achieve higher response rates 
than inexperienced interviewers, controlling for differences in survey design and attributes of 
the population assigned to them. After demonstrating that the relationship is positive and 
curvilinear, they attempt to explain the mechanisms by which experienced interviewers achieve 
these rates and elaborate the nature of the relationship. 

Lahiri and Wang propose new estimators for the ‘‘cost weights’’ and ‘‘relative importances’’ 
which are needed to construct the U.S. Consumer Price Index Numbers. The proposed estimators 
are composite estimators that combine information from relevant sources. A numerical 
comparison with four rival estimators is also presented. 
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Robustness and Optimal Design Under Prediction 
Models for Finite Populations 


RICHARD M. ROYALL! 


ABSTRACT 


In many finite population sampling problems the design that is optimal in the sense of minimizing the 
variance of the best linear unbiased estimator under a particular working model is bad in the sense of 
robustness — it leaves the estimator extremely vulnerable to bias if the working model is incorrect. However 
there are some important models under which one design provides both efficiency and robustness. We 
present a theorem that identifies such models and their optimal designs. 


KEY WORDS: Balanced sample; Bias protection; Model failure; Working model. 


1. INTRODUCTION 


The ‘‘ratio estimator’’ of a finite population total T = y; + ... + yn is TP SNT TX, 
where X = (X; + ... + Xy)/Nis the known population mean of an auxiliary variable and 
x, and , are sample means. This is the best linear unbiased (BLU) estimator of T under the 
model M: 


E(Y;) = 6x; 


(io; X30 eau 
So less tae Mi else. 
This estimator is biased under alternative models having different regression functions, in 
general, but protection against bias under specific alternatives can be assured by careful choice 
of the sample, as will be described below. 

Throughout this paper we will be concerned with populations for which a particular model, 
such as M, is believed to apply, at least to a satisfactory degree of approximation. Our inferences 
will be made with reference to this model. For example, we will call an estimator T unbiased 
only if Ey(7 — T) = 0. On the other hand, we recognize that the model is an approximation 
and that it might be seriously wrong. Thus we describe it as a working model, and seek sampling 
and estimation procedures that are robust in the sense of performing well, not only under that 
working model, but also under alternative models that might better describe the relationships 
between variables in our population. 

We denote by M(6 9,6), ... 5; : v) the general polynomial regression model: 


ll 
E(Y) = Y) 5;8;x1 
j=0 


1 Richard M. Royall, Johns Hopkins University, Baltimore, MD 21205 U.S.A. 
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where 6; is a zero-one indicator of whether the regressor x/ is included in the model. The best 
linear unbiased estimator under this model is denoted by 7(69, ..., 6, : v). Thus our first 
model was M(0, 1 : x), and 7(0, 1 : x) is the ratio estimator. 

Royall and Herson (1973) showed that 7(0, 1 : x) remains unbiased under M(ép, ..., 6; : Vv) 
for any vector (69, ..., 6,) of zeroes and ones, and any v;, ..., Vy, if the sample is balanced 


On Xv 


N 
yy xi/n = yy xin jissth, Qin sel 


Ss 


This means that in a balanced sample 7(0, 1 : x) is robust in the sense that it remains unbiased 
under regression models that are much more general than the working model M(0, 1: x). 
Royall and Herson (1973, sec. 4.5) also detailed how approximate balance ensures the 
approximate unbiasedness of 7(0, 1 : x). Furthermore they showed that in a balanced sample 
this estimator retains not only its unbiasedness but also its optimality under a wide variety of 
polynomial regression models, including M(1 : 1), M(1, 1: x), and M(0, 1, 1 : x). 
Specifically, the estimator is optimal under any polynomial regression model of degree J or 
less, provided only that the model’s variance function is expressible as a linear combination 
of the regressors. 

The robustness of the ratio estimator in balanced samples is achieved at a high cost in 
efficiency under the original working model M(0, 1 : x). Under this model the sample that 
minimizes the variance consists of the n units whose x-values are largest, and the efficiency 
of a balanced sample is only ¥/max,(X,). (Royall and Herson 1973). 

For the linear regression estimator, theoretical results have been established that are quite 
analogous to those sketched above for the ratio estimator, but with one important difference. The 
estimatoris T(1, 1:1) = NI, + b(& —%,)], where b = ¥.(x; — X,)¥/ ¥¢(%;— X,) 70 1t 
is the optimal (BLU) estimator under the constant variance linear regression model, M(1,1:1). 
When the sample is balanced, this estimator is robust, remaining unbiased (and optimal) under 
the same broad class of polynomial regression models as the ratio estimator. But unlike the 
ratio estimator, the regression estimator achieves robustness in balanced samples at no cost 
in efficiency - the variance under the working model M(1, 1 : 1) is minimized in balanced 
samples, where X, = x. This phenomenon occurs because the error variance E(T — T)? is 
the sum of a constant and a term proportional to (* — *X,)? var(b). Minimizing var(b) 
requires maximizing Y,(x; — X,)7, but this term is eliminated altogether in samples with 
¥e.=-R. 

Are there other models under which the same sample that minimizes the variance of the 
BLU estimator can also protect against bias under a wide range of alternative models? In 
particular, are there such models for problems requiring non-constant variance functions? We 
show that the answer is positive, giving a theorem that characterizes a family of models with 
the desired property and identifies the corresponding optimal samples. The results in this paper 
integrate and generalize those of Kott (1984) and Tallis (1986). They are also closely related 
to the work of Pereira and Rodrigues (1983) and Tam (1986), as well as that of Isaki and 
Fuller (1982). 
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2. BASIC RESULTS 


It is convenient to shift to vector and matrix notation, in which Y is the population vector 
(Y,, Yo, ..., Yx)’ and the model M(X : V) specifies that E(Y) = X®8andvar(Y) = Vo’, 
where X isan N X p matrix of regressors, V is diagonal, and the vector 6 and the scalar o” 
are unknown. For a given sample s of 7 units we list the sample units first, so that 


ya( 2) sox = (opie (ec? 
eg jee bed ‘Vhs ad Ae 


where Y,is the (N — n)-vector corresponding to the non-sample units, etc. We let 1, and 1, 
denote vectors (1, ... 1)’ of lengths nm and (N — n). 

The population total is T = 1/Y, + 1/Y,. After the sample s is observed, the first 
component, 1,Y,, is known. The BLU estimator of 7 is obtained by adding to this known 
quantity the BLU predictor of 1,/Y,: 


DAV NW er sXe CX TZ) 
where B(X: V) = (X,Vo1X,) ~! XV>-'Y,. The error variance is 
Varta XV eal) = 1X (A. Xe rt yo, 


where A, = X,V, |X,. These formulas simplify when the vector V1 is in the linear manifold 
generated by the columns of X, which we denote by SU(X). 


Lemma 1. If V1 € SW(X) then 
TK ale ed XB Xa) 
and under M(X: V) 
VAG Xa) ela (LNA ks Ve 1) ae, 


Proof: The estimator simplifies because V1 € D(X) means that V1 = Xc for some vector 
c, so that X/1, = X{V,1X,c, from which we have 1/X,8 = c’X/V_'Y, = 1ZY,. The variance 
formula follows from cov(7, T) = cov(1’XB, 1ZY,) = 1’XA, (X41, = 1’Xc = 1’V1. 


Lemma | shows that for models with V1 € SU(X), the sample affects the variance only 
through A, '. This simplifies both the study of how the variance depends on the sample and 
the search for efficient samples. 

The collection of samples that satisfy 


RW On SX 1' Ww" 1, 


where Wis an N X N matrix, will be denoted by B(X : W). When W is the identity matrix, 
I, B(X : I) is the collection of samples that are balanced on the columns of X. Royall and 
Herson (1973) proved that BLU estimators under a wide family of polynomial regression models 
are greatly simplified in balanced samples: 

Theorem 1. Under M(X : V) with V1 € M(X), ifs € B(X : J) then 
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TXOVY SUNT n) LEX. 
(1) 
var(T(X: V)) = [(N/n) — 1]1’Vlo?. 


The next theorem shows that if V = / then the variance in (1) is the minimum possible, /.e. 
balanced samples B(X : J), are optimal if 71 € SUX); it also identifies optimal samples for 
a class of models with more general variance structure. 


Theorem 2. Under M(X :: V) if both V1 and V1 € D(X), then 
van OG) Soi) Liaise alice 
the bound is achieved if and only if s € B(X : V), in which case 
Tex Vy Vay Ley: 


Proof: Since V1 € I(X), the quantity to be minimized is a’A,-'a, where a = X'1 
(Lemma 1). Now V“1 € S1(X) implies that there is a p-vector c, for which V1 = Xc, and, 
since V is diagonal, this ensures that V,°1, = X,c, for every sample s. From this it follows 
that c{A,c; = n, and the desired inequality then follows from Schwarz’s: 


(aA, ‘a) (CfA, PA Mid (a’ A, 'a) Sy (MES (acer ic 


The necessary and sufficient condition for equality isa’ = kc{A,, wherek = 1’V“1/n. This 
is equivalent tos € B(X : V) because c/ A, = 1, V,~ ”X,. The simple forms for the estimator 
T(X : V) and its variance are then easily obtained algebraically. 

The formulas in Theorem 2 are familiar in conventional (randomization-based) sampling 
theory. The BLU estimator T(X : V) takes the simple form of the Horvitz-Thompson estimator 
Tyr = Ysy;/7;, when 7;, the inclusion probability for unit i, is proportional to v;*. And the 
variance bound is the one established by Godambe and Joshi (1965, Theorem 6.1) for the model- 
based expectation of the random sampling variance. 

Suppose that we have, for a working model M(X : V) that satisfies the conditions of 
Theorem 2, an optimal sample s and BLU estimator T. If we now consider a more general model 
M(X, Z: V) with additional regressor(s) Z, the results of Theorem 2 continue to apply so long 
as the sample belongs to B(Z: V) as well as to B(X : V). Our sample and estimator remain 
optimal under the more general model, and the variance is unchanged. That is, we can maintain 
optimality under our working model (minimum variance sample and BLU estimator) and also 
protect against bias caused by the additional regressor(s) Z by imposing the additional constraint 
B(Z: V) on the sample. This procedure not only protects our estimator from bias under 
M(X, Z: V), it ensures that our sample and estimator both remain optimal under the more 
general model. Of course unbiasedness is ensured under the even more general model 
M(X, Z: W), where W is any covariance matrix. 


3. EXAMPLES 


Four models have been particularly prominant in finite population sampling theory. In the 
polynomial regression model notation of section 1 these are M(1:1),M(1,1:1),M(0,1 :x), 
and M(0,1 : x*). Optimal estimators under the first three models are the expansion, 
regression and ratio estimators, respectively. The optimal estimator under the fourth model, 
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TORS 32) OED VPHAN = n)X,¥,(;/nx;), is approximated by the mean-of-ratios 
estimator Tj7 = NX¥,(y;/nx;) when the sampling fraction n/N is small. 

One approach to finding a practical sampling and estimation strategy under one of these 
four working models is to use the best linear unbiased estimator under the model, while ensuring 
robustness by choosing a sample in which the estimator remains unbiased under more general 
polynomial regression models. For the first two models, M(1:1) and M(1, 1:1), we have 
seen that this strategy produces bias-robustness for free, at no cost in efficiency under the 
working model. Under both of these models bias protection requires simple (unweighted) 
balance; but the models satisfy the conditions of Theorem 2 with V = J, which implies that 
simple balance is optimal. 

For the other two models, however, there is tension between robustness and efficiency. In 
section 1 we noted that under M(0, 1: x) the ratio estimator is optimal, and while the optimal 
sample consists of the n units maximizing X,, protection from bias under M(1, 1 : x) requires 
a sample where X, is not maximized but set equal to the population mean, x. The situation 
under M(0, 1 : x”) is similar: the optimal sample is again the one where the sample mean We 
is maximized, but protection of the optimal estimator against bias under polynomial regression 
models requires an ‘‘overbalanced’’ sample, in which the sample mean equals Y ,x?7/¥,x; 
(Scott, Brewer and Ho 1978). 

Under both of these models, M(0, 1 : x) and M(0, 1 : x), robustness can be achieved at a 
smaller cost in efficiency by starting with a more general working model. Theorem 2 shows the 
way. Consider first the model (0, 1 : x). If we use 7(0, 1 : x”) in an over-balanced sample, 
the error variance is { (NX)*/n — ¥.x? + ¥,(x; — X,)7}o7. But if we use the more general 
working model M(0, 1, 1 : x) and estimator 7(0, 1, 1 : x”), the theorem shows that any sample 
in which %, = .x?/ x; is optimal, yielding the minimum variance { (NxX)7/n — Y.x?7}o’. 
Now bias protection against even more general polynomial regression models can be obtained 
at no cost in efficiency by imposing the additional constraints of Condition B(X : V) i.e. 
¥ xf '/n = ¥Nxi7¥ Nx; j = 0,3, ..., J. Under these constraints on the sample, collectively 
called r-balance, T(0, 1, 1 : x”) is the mean-of-ratios estimator (Kott 1984). This sample and 
estimator remain optimal under all models of the form M(8p, 1, 1, 63, ..., 6) : x’). 

Balanced samples BCX : V) do not always exist. The above example illustrates this; when 
n becomes so large that n/N > N(X?)/¥x? there can be no z-balanced sample, because 
otherwise the variance formula would become negative. Note that the condition n/N > 
N(x?) /¥.x? implies that max(x;) > NxX/n, so that in such populations there is no probability 
sampling plan with inclusion probability proportional to x. 

To generalize the other model, (0.1 : x), so that the theorem will apply we can add a 
regressor, x”: 


E(Y;) = By 2x)" win e583: 


var (Y}) = 07x}. 


According to Theorem 2 any sample satisfying 


N N 
‘¥ xn = 1S =| ai (2) 
1 


St 1 


is optimal under this model, yielding the best linear unbiased estimator ¥x/*¥,x; “y;/n 


and the minimum variance, { ( ¥x/*)*/n — NxX}o’. This variance compares favorably with 
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that of the ratio estimator in a balanced sample, Nx(N/n — 1)o7. Now optimality of the sample 
and the estimator if in fact E(Y;) = Bo + Bix;? + Bix; + Box? can be maintained (with 
no increase in variance) by imposing the additional conditions on the sample: 


N 
ye sn = v| > Kae 
I 


S 


(3) 


N N 
1 
yan = Yoel Yah 
1 1 


Ss 


These conditions, (2) and (3), give the BLU estimator the simple form: 


N 
bie steed aa We yrs 
1 S 


which is of course the Horvitz-Thompson estimator for a probability-proportional-to-x ” 
sampling plan. 


4. PROBABILITY SAMPLING 


The results in Section 2 are important in relation to an unobserved regressor Z. If Z were, 
like _X, known for all population units, then we could use MCX, Z : V) as the working model 
and 7(X, Z : V) as the estimator in the first place. But suppose that we are unaware of the 
importance of Z and are using the working model M(X : V) and the estimator T(X : V) when 
in fact M(X, Z: V) applies. In this context we will refer to a sample from B(X : V) as 
‘*balanced on_X.’’ Although we can choose a sample that is balanced on_X, we cannot ensure 
that it will be balanced on Z, and if it is not, then our estimator is biased: 


ECL(X VV) — TY = CU ig za) ae A 2 


where ¥ is the Z-coefficient: EY = XB + Zy. 


Random sampling can help to provide protection against biases like this. If we use a proba- 
bility sampling plan with inclusion probabilities, 7; = nvj*/1’V“1, i = 1, 2, ..., N, then 
we will have balance on Z in expectation: 


Pe 2 eae Ae i 


the subscript a indicating that the expectation is with respect to the random sampling plan, not 
a prediction model. Furthermore, if our sampling plan is one under which var, (1’V- “Z,/n) 
approaches zero as n grows, then the probability that we will draw a sample that is 
badly unbalanced, say one in which |1/V~ %Z,/n — 1'Z/1’V1| > 6, can be made small 
by taking a large enough sample, n. That is, probability sampling can provide balance on Z 
‘*in probability.’’ 

The strength of this result is in its scope-it applies for any matrix Z of regressors whatsoever. 
In particular it applies for the matrix X of regressors in our working model, as well as for 
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overlooked regressors. The weakness of course is that it applies to the sample selection process, 
not to a result of that process. The sample actually drawn will, with predictable frequency, 
be badly unbalanced on the known regressors X. If balance on_X is important in a particular 
study, it should not be left to chance (This was documented empirically by Royall and 
Cumberland 1981). Restricted random sampling plans which guarantee that the selected sample 
will be balanced on X, such as Wallenius’s ‘‘basket method’’ (1980), might represent a 
reasonable compromise strategy. 

It sometimes happens that a regressor Z that is ignored when the sample is selected becomes 
available afterwards, as in the case of post-stratification for example. If it is determined that 
the selected sample is badly balanced on Z, then probability sampling has failed to provide 
the expected protection against bias under M(X, Z : V); if it is too late to draw another sample, 
then to protect against the bias we must use an estimator that is unbiased under this model. 
That is, probability sampling does not guarantee approximate balance on Z; it only ensures 
that we have a good chance at approximate balance. It justifies confidence that a given sample 
is reasonably well balanced, in the absence of evidence to the contrary. It does not justify 
ignoring evidence of imbalance when it occurs. 

Note that under the above probability sampling plan the estimator (1’V’71) (1V,- “ Y,) /n, 
which is T(X : V) if both V1 and V1 belong to IX (X) and sis in B(X : V), is unbiased with 
respect to the probability distribution generated by the sampling plan. But if the sample actually 
selected is not balanced on_ X (i.e. if sis not in B(X : V)) then this estimator is not unbiased 
under M(X : V). 
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Robust Model-Based Methods for Analytic Surveys 


T.M.F. SMITH and E. NJENGA! 


ABSTRACT 


This paper reviews the idea of robustness for randomisation and model-based inference for descriptive 
and analytic surveys. The lack of robustness for model-based procedures can be partially overcome by 
careful design. In this paper a robust model-based approach to analysis is proposed based on smoothing 
methods. 


KEY WORDS: Analytic surveys; Robustness; Smoothing methods. 


1. INTRODUCTION 


The concept of robustness in finite population inference from both the randomisation and 
model-based viewpoints is examined. In his seminal paper on a unified theory of sampling from 
finite populations Godambe (1955) not only proved his famous non-existence theorem but also 
made suggestions for robust finite population inference. He proposed a superpopulation model 
for the unit variables y; and suggested that strategies, that is the choice of both design and 
estimator, should be based on the model expectation of the sampling variance. He then imposed 
p-unbiasedness to obtain optimum strategies. These ideas were amplified in several papers 
including Godambe (1982) and Godambe and Thompson (1977). The results obtained include 
the apparent optimality of zps sampling and the Horvitz-Thompson (1952) estimator. But the 
inefficiency of this strategy in multipurpose surveys is well known so we find these results on 
optimality and robustness less convincing than the apparently negative results on the 
foundations of inference. 

The lack of robustness of many model-based procedures is well known, see Hansen et al. 
(1983), and much of the work of Royall and his colleagues, for example Royall and Herson 
(1973a,b) has been devoted to constructing robust model-based strategies. After reviewing this 
work we propose a robust model-based method for estimating many complex statistics 
employed in the multivariate analysis of survey data which adjusts for the effects of selection. 
Our proposal is not a strategy but is a procedure which can be employed for the analysis of 
survey data after the sample is drawn. 


2. FORMAL STRUCTURE 


In order to examine robustness we must first structure finite population inference in the 
formal manner pioneered by Godambe (1955). We consider a population of N units with label 
set U = {1, 2, ..., N}. Attached to unit / is a vector of values, y;, which will be measured 
on the sample units, and yy = ();, ..., x) denotes the finite population matrix of values. 
A sample, s, isa subset of U drawn according to some rule. We are concerned here with rules 
based only on prior information, z;, available on all the units in the population. Let zy denote 
the prior information for the whole population, and let p(s | zy) denote the sampling rule. 
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Since the rule does not depend on yy it is uninformative. If p(s | zy) is a random sampling 
rule then it determines a probability distribution over ¢, the set of all samples, which is the 
basis for randomisation inference. The sample data comprises d, = { (i,y;): les}. Let y, 
denote the matrix of sample values, then an estimator is a function of the data, d,, and of the 
prior information, Zy, which includes auxiliary information. We denote by &,, \,, expecta- 
tions and variances with respect to the distribution p(s | Zy). 

In a model-based approach it is further assumed that the population values yy are random 
variables. A major problem with this approach is to specify a parametric probability model 
for the joint distribution of all these random variables, which must be based on all the prior 
information including that on the structures of, and relationships between, the units in the 
population. So models must reflect hierarchical groupings (clusters) and block groupings 
(strata), as well as correlations between the variables. This structure is potentially so complex 
that attention is usually restricted to means and covariance matrices. In general let f(Yy | Zu;A) 
denote the conditional finite population distribution, where \ is a vector of unknown 
parameters. For predictive inference about finite population values, such as totals, this is a 
sufficient specification. For analytic inference about parameters in the marginal distribution 
of y we must additionally specify the marginal distribution of the prior values zy. Let f(Zu;¢) 
denote this distribution, then the marginal distribution of yy is 


f(vui8) = Sf Ou | Zui) f(4us¢) au, (20) 


where § = g(A,¢) is the parameter of analytic interest. 


Applying the sampling rule to the population generates the data, d,. The joint distribution 
of the data, d,, and prior values, Zy, is 


f(d,,Zus.o) = P(S | Zu) SfQu | zu»F (Buse) ays 
(2.2) 


= p(s | gu) FOs | Zuid) F(au;), 


where § denotes units not in s. This distribution is the basis of a model-based approach to 
inference. We let E,,, V,,, denote expectations and variances with respect to the model. 

An implication of (2.2) is that the sampling rule, p(s | zy), must be completely known to 
the person making the inference, as must the values of zy. Absence of knowledge may render 
p(s | Zy) informative about the unobserved values y;, see Scott (1977), Sugden and Smith 
(1984), in which case it cannot be taken outside the integral in (2.2). 

In this general set-up, embracing both random selection and modelling of values, randomisation 
inference corresponds to the case where the values yy are unknown constants and the model 
distribution becomes degenerate at the point yy. The only probability remaining is that in 
p(s | Zy), and this distribution over the set ¢ of all possible samples is the basis of randomisa- 
tion inference. Note that the randomisation distribution is completely specified by knowledge 
of the sampling rule and of the prior values, zy. It does not depend on any unknown 
parameters or on the survey values, yy . This renders p(s | zy) uninformative because there 
is less information in p(s | zy) than in Zy itself. This accounts for the negative nature of 
Godambe’s results about randomisation inference. 

In contrast model-based inference depends solely on the model component of (2.2), since 
p(s | Zy) contains no information about y;. Predictive inferences about y; are made using the 
conditional distribution, f(y, | ¥;,Zu;\), independent of the randomisation distribution, 
p(s | Zy). The sampling rule is still important at the design stage, for it affects efficiency and 
robustness, but it has no réle to play at the inference stage. Random sampling also provides 
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a guarantee that the sampling rule is in fact uninformative, providing a scientifically accep- 
table sampling procedure. Model-based inferences may not be robust, however, because they 
may depend strongly on the choice of model, as demonstrated by many authors including 
Hansen ef al. (1983). 

A compromise solution is to employ both components of (2.2), the model and the 
randomisation distribution, in the choice of estimator. This was proposed by Godambe (1955) 
as a positive response to his negative results. He proposed using as a criterion the model 
expectation of the randomisation variance, namely E,, V,(t;), where ¢, is an estimator of a 
finite population total 7. To find an optimum solution in a particular class of models Godambe 
restricted the choice of ¢, to the class of p-unbiased estimators. This restriction has been much 
criticized and subsequently several authors, including Brewer (1979), Sarndal (1980), Isaki and 
Fuller (1982), Little (1983), have proposed replacing exact unbiasednesses by some form of 
approximate unbiasedness. This is usually expressed in the form of asymptotic design 
unbiasedness which requires the construction of a hypothetical sequence of finite populations 
with sizes tending to infinity. Although one may feel unhappy with this mathematical 
construction the suggestion that strategies, chosen before drawing the sample, should be based 
on considerations of the average under a model of a repeated sampling procedure is perfectly 
acceptable. The controversial issue is the choice of distribution for making inferences after 
the sample has been drawn. 


3. ROBUSTNESS 


Robustness is not a well defined concept in statistics. The Encyclopedia of Statistical Sciences, 
(Kotz and Johnson 1988), states that: 


“a robust procedure performs well not only under ideal conditions but also under 
departures from the ideal.’’ 


It goes on to say that both the nature of departures from the ideal and the meaning of ‘‘performs 
well’? must be specified. With this broad definition in mind we now examine robustness for 
randomisation and model-based inference for finite population totals. The general perception 
is that randomisation inference is robust and that model-based inference is not. 

Godambe’s negative results can be interpreted to mean that randomisation inference is 
impossible in general. This is certainly true for heterogeneous populations, such as Royall’s 
axe, ass and box of horseshoes, or for populations with a few very extreme values, but for 
homogeneous populations the evidence overwhelmingly shows that randomisation inference 
is not only possible but also works in a well defined sense. 

Employing randomisation inference implies abandoning certain statistical principles, such 
as the likelihood principle, and replacing them by an appeal to the central limit theorem. The 
assertion is that under repeated random sampling using the specified rule p(s | zy) 


t, — T 
=. ~ N(0,1), (3.1) 
Vp (ts) 


for any ¢, which is approximately p-unbiased for T, where both N and 7 are large, but n/N 
is small. Although proved formally only under SRS and related schemes, empirical evidence 
shows that the randomisation coverage properties of 95% confidence intervals of the form 
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t, + 1.96/V,(t,), (3.2) 


where V, (t,) is a consistent estimator of V,,(¢,), are approximately correct except for extreme 
designs or heterogeneous populations. 


Godambe and Thompson (1977) express their views about this approach in the following 
terms. 


‘*The use of such a confidence interval may be interpreted as follows: 


I: Weare fairly sure a priori that y belongs to that subset of R™ for which the 
interval covers T(y) for 95% of all possible samples. 


II: There is no way that the sampled y-values, in conjunction with whatever other 
information we may have about the population, have altered the conviction 
inI. Thus even after sampling we believe that if the design were implemented 
again and again on this population the interval would cover T(y) approximately 
95% of the time. 


The robustness of the interval arises of course from the fact that only very weak 
and essentially informal conditions are required for the validity of its interpretation 
in the sense of I and I.”’ 


Very similar views are expressed by Hansen ef a/. (1983). 


‘*For probability-sampling designs the computed confidence intervals, for samples 
large enough, are valid in the sense that the randomization probability that the 
confidence intervals contain the value being estimated is equal to or greater than 
the nominal confidence coefficient, independent of the distribution of the charac- 
teristics among the elements of the population from which the sample is drawn.”’ 


“Robustness is usually understood to mean that inferences made from a sample 
are insensitive to violations of the assumptions that have been made. In principle, 
and ordinarily in fact, robustness is achieved in probability-sampling surveys by 
the use of sampling with known probabilities (i.e., randomization) and consistent 
estimators, and using a large enough sample that the central limit theorem applies, 
so that the estimates can be regarded as approximately normally distributed. ”’ 


Note that this concept of robustness does not appear to require any specification of ideal 
conditions or of departures from the ideal. Random sampling and consistent estimation are 
all that is required. Brewer and Sarndal (1983) are quite explicit: 


“Probability sampling methods are robust by definition; since they do not appeal 
to a model, there is no need to discuss what happens under model breakdown. ’”’ 


How can a Statistical procedure be so robust? 


The reason is that the entire procedure is under the control of the statistician, no attempt 
is made to introduce “‘nature’’ into the structure. The randomisation distribution has a known 
form and does not depend on unknown parameters. There is no need to make an inference 
about p(s | zy). Similarly the framework for inference is chosen by the statistician, it is 
repeated sampling using p(s | zy). Different statisticians may use different sampling rules and 
estimators but the procedure represented by (3.1) gives approximately correct coverage 
properties in every case, and so is robust. This is an example of criterion robustness. However, 
any given procedure may not be efficient for the totals of some variables. We have already 
highlighted the well known inefficiency of the Horvitz-Thompson estimator which occurs when 
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the survey variable is negatively correlated with the size variable. The search for efficiency 
robustness over a wide range of variables leads frequently to the recommendation that the design 
should be a stratified SRS design, see, for example, Godambe (1982), Hansen et al. (1983). 

In model-based inference the statistician is playing the game of modelling ‘‘nature’’. 
Probability distributions such as f(yy | Zu;\) are chosen by the statistician but their true form 
is unknown, as also are the values of the parameters. If an estimator, ¢, of 7, is chosen then 
its expected value and variance will depend on the choice of model. Deviations from the model 
may lead to changes in the mean and variance and hence to changes in confidence intervals 
based on applying the central limit theorem to the model residuals. In model-based inference 
the robustness due to the central limit theorem is more limited than that in randomisation 
inference since it applies only to the residuals. Some model deviations can be controlled by 
choosing an appropriate design, as in Royall and Herson (1973a,b), but there can never be 
complete robustness. The framework for inference is also completely different. Instead of 
employing the unconditional distribution based on repeated sampling model-based inference 
employs the conditional distribution given the selected sample s. 

Can these two positions ever be reconciled? Before sampling, when choosing strategies, they 
can. Both schools of thought have the same prior information, zy, and both use models to 
suggest designs and estimators and choose strategies based on the overall mean squared error 


| i i el ia (3.3) 


Randomisers usually impose a constraint such as approximate p-unbiasedness while modellers 
may impose approximate model unbiasedness and the two positions can be reconciled by 
choosing a sample design such that the model-unbiased estimator is also p-unbiased. This 
strategy utilizes the full structure of (2.2) and gets the best of both worlds. 

After sampling there appears to be little hope of reconciliation. The two frameworks for 
inference are quite different, one being based on an unconditional distribution the other on 
a conditional distribution. Royall and Cumberland (1981) have demonstrated convincingly how 
much difference this can make. Incidentally they have also demonstrated the lack of robustness 
of some of the conventional model-based variance estimators. 

One case where reconciliation is possible occurs in stratified sampling. Both randomisers 
and modellers have converged on stratified sampling as a robust design, and for SRS within 
strata model-based and p-based inferences coincide. This provides evidence for one of the few 
positive results in sample surveys: 


Theorem: Stratification is a good thing. 


Proof: See Cochran (1977, Ch.5). 


Stratification allows us to look at the problem of robustness more closely. If both a randomiser 
and a modeller adopt the same stratification, and both also adopt the same SRS design within 
strata, then for a given sample they will both make identical inferences. Now suppose on the 
basis of further analysis or evidence it is agreed that an extra level of stratification should have 
been used. How does this affect the respective inferences? The modeller now has to say that 
the original model was misspecified and hence that inferences from that model would be biased. 
Both the estimator and the variance of the original model would be wrong. The randomiser, 
however, can say that the extra information is interesting, and could be used to post-stratify 
the original results, but that it can also be ignored if necessary because the original inferences 
are still valid in the sense defined in (3.2). All that has happened is a possible loss of efficiency. 
In one case the original inference is condemned as not being robust, in the other case the same 
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inference is apparently robust. The modellers bias, when averaged over repeated samples, is 
transformed for the randomiser into a component of sampling variance, or a loss of efficiency. 
So if initially randomisers and modellers start from the same position then deviations from 
that position are interpreted differently. In one case it is a bias in the other case a variance. 
Can this really be called robust in one case and not robust in the other? 


4. ANALYTIC INFERENCE 


In analytic inference the target for inference is no longer a known function of the finite 
population values, yy, so that evenif nm = N there is still residual uncertainty in the inference. 
Examples are tests of hypotheses, where the null hypothesis of no difference is meaningless 
in a fixed finite population. Possible targets for inference are the parameters A, ¢, of the model 
(2.2), or functions of them such as @ in (2.1). Other targets are the parameters in finite popula- 
tions related to the given finite population in some known way, perhaps through a spatial or 
time series structure. Methods for analytic inference have recently been reviewed by Skinner 
et al. (1989). 

The starting point for analytic inference is the specification of the superpopulation model 
which aims to show how the finite population is related to the superpopulation. A common 
assumption is that the finite population is generated as IID random variables from a super- 
population. Whether this can be justified for populations with structure, such as clustering 
or stratification, is debatable. In this paper we assume that it is true, at least within broadly 
defined strata. With this assumption a SRS from the finite population is itself an IID sample 
from the superpopulation and inferences can be made directly from the sample to the 
superpopulation. If the sample is not a SRS, but is drawn using a design p(s | zy) which uses 
the information in zy, then the achieved sample is no longer an IID sample from the super- 
population. This is the problem of selection and the effect of selection must be taken into 
account in the final inference. 


The superpopulation model establishes a hierarchy, 
superpopulation D finite population D sample. 


If the finite population is IID from the superpopulation then finite population parameters, 
such as means, are related to the corresponding superpopulation parameters by 


Ju = En(Ju) + Op(N~”). (4.1) 


Since N is usually very large an inference about yy is a good approximation to an inference 
about &,,,(¥y). Inferences about yy using the p-weights associated with the sampling rule 
p(s | Zy) are the basis of the randomisation approach to analytic inference. Note that this 
approach depends strongly on the IID assumption for the finite population. 

For more complex analyses, such as logistic regression analysis, the ppeudo-MLE approach 
in Skinner et a/. (1989, sec. 3.4.4.) and Binder (1983) can be used to define both the finite 
population parameter of interest and the randomisation estimator. The finite population 
parameter is usually defined through an estimating equation, see Godambe (1960) and Godambe 
and Thompson (1986). As in Section 3 confidence intervals are based on the unconditional 
distribution generated by repeated random sampling. 

Model-based analytic inference is based on the complete model of the survey population yy, 
the design variables z,, and the sample selection rule p(s | zy), that is 


Survey Methodology, December 1992 193 


S (yu Zu 834.0) = f(vu | Zu3d) f(Zu3)P(s | Zu)- (4.2) 


For random sampling rules the selection scheme leaves the conditional distribution f(yy | Zy3\) 
unchanged, but changes the marginal distribution of zy from f(Zy;¢) before selection to 


&5(Zu3o) = f(Zu3d)p(S | Zu) (4.3) 


after selection. Thus inferences about A are unaffected by selection but inferences about 4, 
and hence about 0 = g(\,@), the parameters of the marginal distribution f(yy;@), are 
affected by selection. For these latter inferences the sample data cannot be treated as though 
it were a SRS from the superpopulation model. 

If we assume that the superpopulation distributions are multivariate normal then 


(i) E(y | Zz) is linear in z, and 


(ii) V(y | z) = K, independent of z. 


Under these assumptions of linearity and homoscedasticity a model-based estimator of the 
covariance matrix, ),,, of y is given by 


ey ee (Ue (4.4) 


as shown in Skinner et al. (1989 Section 6.4), where Vyy,, Vzz5, Dy, are sample covariance 
matrices and a matrix of regression coefficients based on treating the sample data as IID from 
the conditional distribution f(yy | Zu;). We call (4.4) the Pearson adjusted estimator after 
Pearson (1903). 

Theoretical and empirical studies by Pfeffermann and Holmes (1985), Holmes (1987) and 
Nijenga (1990), have shown that model-based inferences from (4.4) are not robust to departures 
from the assumptions of linearity and homoscedasticity. Nathan and Holt (1980) proposed 
a p-weighted version of (4.4) as a more robust alternative. This estimator is formed by replacing 
all the equally weighted sums in (4.4) by the corresponding p-weighted sums. The resulting 
estimator is called the probability weighted maximum likelihood estimator (pwm/). The 
properties of this estimator have been studied empirically and theoretically in Holmes (1987), 
Njenga (1990) and in Skinner, Holt and Smith (1989, Ch.8). It was found to have similar uncon- 
ditional properties to alternative p-weighted estimators, such as the Horvitz~-Thompson 
estimator of ),,, and superior conditional properties. In the simulation study in Section 6 the 
pwml estimator is taken to represent the entire class of p-weighted estimators. Since the 
p-weighted version of V,,, in (4.4) is a design consistent estimator of J,,, the resulting 
estimator is a design consistent estimator of }’,,. We now investigate a new robust model- 
based procedure. 


5. A NONPARAMETRIC MOMENT-BASED ESTIMATOR 


In this section we attempt to overcome the lack of robustness of model-based estimators 
such as (4.4) which depend strongly on assumptions of linearity and homoscedasticity. If the 
finite population is realized as IID observations from the superpopulation and if interest centres 
on the superpopulation parameters p,, )) yy in the marginal distribution of y, then the approach 
we adopt uses the fact that the sample data are IID from the conditional distribution f(y | Z) 
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while the design variables zy, are an IID sample of size N from the marginal distribution of 
z. For simplicity we assume that only one design variable has been used, such as a measure 
of size, so that z is a scalar random variable. 

We assume that the conditional mean and covariance matrix of y given z are smooth 
functions of z of unknown form. Let 


Ey (:z) =p), (5.1) 


Vii sa Bes (a) (5.2) 


These parametric functions can be estimated using some form of nonparametric estimation 
such as linear smoothing. Examples of linear smoothing methods are kernel estimation, see, 
for example, Gasser and Muller (1979), local regression, see, for example, Cleveland (1979), 
and smoothing splines, see, for example, Silverman (1985). We propose estimating the functions 
in (5.1) term by term using the kernel estimator 


B(z) = )) W(z,z)y- (5.3) 


JES 


We constrain the sum of the weights to be unity so that the estimator is a weighted average 
and employ the Gaussian kernel with k being the bandwidth. These estimators have been 
extensively studied and a recent review is Gasser and Engel (1990). 


The structure in (5.1) and (5.2) implicitly assumes that we can write 


Vint S GRIEG A IES (5.4) 
so that 
& =) — BE poe JS: (>.>) 
Thus 
Ge? = (yy — H(z); — BL)" (5.6) 


is an estimator of Y yy (z;). Applying a linear smoother to each term o,,(Z;) of Y yy (zj) gives 


Gqp(Z) = 3 Wi (za eb> (S-/) 


JES 


where W;,(z,z;) is a kernel with band width A which will usually be wider than the band width 
k chosen for the estimation of the conditional mean, (5.3). 


The estimates of the marginal moments then employ the standard results that 


Ky = E.(u(2)), (5.8) 


Ly sade, (25 eer (ew (2) oe (5.9) 
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Now 
by = J u(z)f(z)dz, 
and our proposed estimator is 
iy = f a(z)f(z)dz. (5.10) 
Since N is large we propose using the empirical p.d.f. (Parzen 1962), given by 
GE z= aoe NI Nagel eerie of ly xs Ni, (5.11) 
= Ui te mOUEn WISE +, 


Substituting in (5.10) gives the estimator 
N 
tain Date (5.12) 
j=l 


To estimate L,, we adopt a similar procedure for the first term of (5.9). The second term 
can be written 


V(u(z)) = J (wz) — By) (wu (2) — By) F(z)az. (5.13) 


For our estimator we propose 
N 
5 uN R RES: ; T 
Vz) INGE NPE ea) (EGY BO) (5.14) 
j=l 
Thus the proposed estimator of is L,, is 
N 
Ly = nol DE aga) ee (a) CZ a7]. (5.15) 


j=t 


Njenga (1990) examines the asymptotic statistical properties of these estimators. 


One of the main reasons for estimating L,, is to carry out some form of multivariate analysis, 
such as a regression analysis between two or more of the components of y. In the next section 
we report the results of a simulation study in which the simple regression coefficient between 
two y-variables is estimated from stratified random samples with different sampling fractions. 


6. ESTIMATING A REGRESSION COEFFICIENT 
A SIMULATION STUDY 


Let y = (;,¥2)7 with mean Ly = (41,42)/ and covariance matrix 
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We are interested in estimating a function of })),,, the simple linear regression coefficient, 
2 
Byy = 042/03. (6.1) 


The elements of )',, will be estimated using: 
(i) the Pearson adjusted estimator of Ly based on (4.4), 
(ii) the probability weighted version of (4.4), 


(iii) a kernel estimator based on (5.14). 


The corresponding estimators of B,», or of its finite population equivalent B,»,, are denoted 
Bim» Br pwmi and B1> nw Tespectively. The estimator Bj mis indexed by “‘ml’’ beets it is 
also the MLE under a multivariate normal model. The estimator B)p ,,,, is indexed “‘nw’’ after 
Nadaraya (1964) and Watson (1964). The first two estimators were chosen cannes te their 
good performance in previous simulation studies, see Skinner ef a/. (1989, Ch.8). 

We carried out three types of simulation study. In the first simulation study we generated 
a multivariate normal population to compare the performance of the new estimator with the 
maximum likelihood estimator which is optimal for this population. In the second simulation 
study we generated a quadratic homoscedastic population to compare the estimators when only 
the linearity assumption is violated. In the last simulation study we compared the estimators 
when the structure of the population is unknown, /.e. we used a ‘real’ population. In these 
simulation studies we carried out both conditional and unconditional analyses. The former 
allow us to assess whether a particular estimator is good in some samples and poor for others 
whereas the latter averages over all possible samples for a particular design. 


The new estimator uses the Gaussian Kernel 
W,(2i,%)) = cexp{— (z; — %)?/2k7}, icU, jes, 


whereic; = 17) 7a, EXD = (Gj zj)°/2k7}. A simulation with different values of the band 
width k showed that the mean squared error was relatively constant for a wide range of values 
of k and that this was achieved by trading off bias against variance. We selected values for 
k that gave relatively small values for the bias for each stratified sample design. 

Since the ‘real’ population available to us was 6,962 observations from the 1975 UK Family 
Expenditure Survey we constructed all three populations to be of this size with mean vector 
and covariance matrix 


by O; 912 Diz 

Se: y= 2 
eae b2 , a 02, 922 
Mz o: 


The actual values of y are shown in Table 6.1. 


The design variable is based on the expenditure on food, the independent variable is the 
total income and the dependent variable is the total expenditure. This finite population was 
stratified into five strata according to increasing values of the design variable, such that the first 
stratum contains 1,393 units with lowest values of z, second, third, fourth contain 1,392 units 
each and the fifth contains the last 1,393 units with the highest z values. 
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Table 6.1 
Parameter Values from the Real Population 
Variable SD Correlation matrix 
y, Expenditure on all items 0.668 1 
y2 Total income 0.849 Oe) 1 
Zz Expenditure on food 0.658 0.41 0.28 1 
Table 6.2 


Stratified Sample Designs 


Sample design ny ny n3 N4 Ns Symbol 
D1 Proportional allocation 20 20 20 20 20 A 
D2 Increasing allocation s 9 16 30 40 Vv 
D3 U-shaped allocation 40 8 4 8 40 + 


The sample designs used were based on those used by Holt, Smith and Winter (1980). 
Denote a stratified random sampling design by (”, ... m5) with n, units selected from the ht 
stratum, h = 1, ..., 5, then the designs are shown in Table 6.2, together with the symbols 
used in the plots. 

For the various stratified sample designs we selected 1,000 independent samples of size 100 
from the finite population. The sampling distribution of the various statistics under investiga- 
tion were estimated from these 1,000 repeated samples. We obtain the unconditional results 
by averaging the statistics under investigation over all the 1,000 samples. 

To assess the conditional properties of the estimators the 1,000 samples were divided into 
20 groups of 50 samples each according to increasing values of AZ,= (Sz,, — Sz,)/Sz, for the 
nw and mi estimators where 


S, = Near — co), Seagman * Y3(z; — 2)"; 
Po N 2s) he ny Zee 
and of A*/ = (S*, — S,,)/S,, for the pwml estimators where 
St. = Vewilz — 2)", 2 = Lewizi. w; = (Nx) ‘and’ 7; 


denotes the probability of including the i“ unit in the sample such that the first group 
contained the 50 samples with the smallest values of AZ, (or A*/’) and so on up to the 20th 
group which contains the 50 samples with the largest values of Ae (or AxF ). We assume that 
the variation in AZ, (or A*/’) within each group is small. The conditional distribution of the 
various estimators given A‘, (or A*/’) can then be plotted. 

The biases, standard deviations and mean square errors reported in simulation studies | and 2 
are computed around the value of B,», in the finite population generated from the model. This 
enables them to be compared with the values generated from the real finite population in 
simulation study 3. 
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Table 6.3 


Unconditional Absolute Biases of the Three Estimators of Bj> 
N = 6,962, n = 100° True Valué B}, = 0.595 


Sample design 


Absolute biases of 


By mi Bisel By nw 
D1 0.0003 0.0003 0.0185 
D2 0.0007 0.0019 0.0269 
D3 0.0026 0.0018 0.0159 
Table 6.4 
Unconditional Standard Deviation of the Three Estimators of By 
Standard deviations 
Sample design = ~ = 
By2 mi By? pwml Bia nw 
D1 0.0500 0.0500 0.0507 
D2 0.0522 0.0693 0.0531 
D3 0.0486 0.0710 0.0503 
Table 6.5 
Unconditional Mean Square Errors of the Three Estimators of By 
Mean square errors 
Sample design = = = 
By2 ml By pwml Bi2,nw 
D1 0.0025 0.0025 0.0029 
D2 0.0027 0.0048 0.0035 
D3 0.0024 0.0050 0.0028 


Simulation Study 1 


In the first simulation study the 6,962 finite population values were generated from a 
multivariate normal distribution with correlation matrix given in Table 6.1. These data should 
be favourable to the estimator By» m7. 

The unconditional biases, standard deviations and mean squared errors are shown in Tables 
6.3, 6.4 and 6.5. 

As expected the estimator Bio mi is best in terms of mean squared error. The new estimator 
Bish does surprisingly well, it has a large bias but a similar standard deviation. The size of 
the bias for a very smooth (linear) population is consistent with the results in other studies, 
see Gasser and Engel (1990). A very wide bandwidth is needed to capture a very smooth 
function. 
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Figure 6.1 Scattergram of group means of B 12,ml 
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Figure 6.2 Scattergram of group means of B 12 pwml 
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Figure 6.3 Scattergram of group means of B12 nw 


The conditional plots are shown in Figures 6.1, 6.2 and 6.3. These plots show that there 
is no additional pattern to the bias beyond the absolute level of bias shown in Table 6.3. Previous 
studies have shown consistent patterns of bias for SRS estimators and simple p-weighted 
estimators, see Skinner ef a/. (1989, Chs. 7 and 8). 


Simulation Study 2 


Repeated sampling from a quadratic homoscedastic population 


This simulation study is similar to one carried out by Holmes (1987). We generated 6,962 
finite population values of ()4;,)2;,Z;) 1 = 1... 6,962 by first generating a value of z; from 
the uniform distribution U(0,10). Using this generated value of z; the corresponding values 
of y,; and y>; are obtained from the relationships; 


Voi = My + yz; + Roz? + €; 
and 


Vi = my + Ayz; + Riz? + ej, 


where €>; and €;; are random variables from normal distributions with mean zero and constant 
variance, and R; # 0, R» # 0. Following Holmes (1987) we chose the parameters in these 
expressions so that the regressions of y, and y) on z are monotonically increasing functions 
of z and the regression of y,; on y is approximately linear so that the regression coefficient 
B,z will be a meaningful parameter to estimate. 
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Table 6.6 


Unconditional Standard Deviation of the Three Estimators of By 
N = 6,962, n = 100 True Value By, = 0.857 


Absolute biases of 
Sample design 


Biz mi Bie cul Bio 

D1 0.0119 0.0119 0.0171 

D2 0.0923 0.0132 0.5556 

D3 0.0124 0.0098 0.0104 
Table 6.7 


Unconditional Standard Deviation of the Three Estimators of Bj 


Standard deviations 


Design = = = 
By2 mi By2 pwml Bio nw 
D1 0.0877 0.0877 0.0877 
D2 0.0972 0.1230 0.1150 
D3 0.0785 0.1110 0.0797 
Table 6.8 


Unconditional Mean Square Errors of the Three Estimators of By 


Mean square errors 
Sample design 


By2,mi Big oil Bi2, nw 
D1 0.0078 0.0078 0.0080 
D2 0.0180 0.0153 0.0164 
D3 0.0063 0.0124 0.0065 


The unconditional results of the three estimators of the regression coefficient are given in 
Tables 6.6, 6.7 and 6.8. 

We see from the tables that the ml estimator is severely biased and very inefficient for the 
increasing allocation design D2, but is approximately unconditionally unbiased and efficient 
for the designs D1 and D3. The pwml estimator as expected is approximately unconditionally 
unbiased across all the sample designs considered. Though more biased than the pwml 
estimator, the nw estimator is less biased than the m/ estimator for the unequal probability 
designs. We also see that the nw estimator is more efficient than m/ for the design D2 and 
approximately equally efficient for design D3. It is also more efficient than the pwm/ estimator 
for the U-shaped design D3. 
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Figure 6.4 Scattergram of group means of B 12,ml 
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Figure 6.5 Scattergram of group means of B 12,pwml 


Survey Methodology, December 1992 203 


Group means of B 


0.90 


0.80 


0.70 
-1.00 -0.37 0.25 0.87 1.50 


Group means of A F 
ZZ 
Figure 6.6 Scattergram of group means of B12 nw 


The plots of the conditional analysis are shown in Figures 6.4, 6.5 and 6.6. 

We see from Figure 6.4 that the m/ estimator is approximately conditionally unbiased for 
the design D1 and D3, and has no additional conditional bias for the design D2. From Figure 
6.5 we see that the pwml estimator has no additional conditional bias for any of the designs. 
We see from Figure 6.6 that the nw kernel estimator has only a small additional conditional 
bias within each of the three probability designs. 


Simulation Study 3 
Repeated sampling from a multivariate ‘Real’ population 


In this simulation study we employ the 6,962 actual data points from the Family Expen- 
diture Survey for the finite population. We consider the same variables as in section 3.1 and 
sample repeatedly from this population to investigate the robustness properties of the three 
regression estimators. We expect the real population to violate all the normality assumptions. 

The unconditional results are shown in Tables 6.9, 6.10 and 6.11, and we see that the nw 
kernel estimator is the most efficient and is approximately unconditionally unbiased across 
all the probability designs. The m/ estimator is less biased and more efficient than the pwm/ 
estimator for the unequal probability designs. 

The plots of the conditional analyses are shown in Figures 6.7, 6.8 and 6.9. 

We see from Figure 6.7 that the m/ estimator is approximately conditionally unbiased for 
the designs D1 and D2 but has a slight conditional bias for design D3. From Figure 6.8 we see 
that the pwm/ estimator has no additional conditional bias for any of the designs. From Figure 
6.9 we see that the nw kernel estimator is approximately conditionally unbiased for the three 
probability designs. 
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Table 6.9 


Unconditional Absolute Biases of the Three Estimators of By 
N = 6,962, n = 100 True Value Bjy = 0.595 


Absolute biases of 
Sample design 


By mi Bia nual Bip sew 

D1 0.0245 0.0245 0.0056 

D2 0.0260 0.0408 0.0060 

D3 0.0128 0.0355 0.0072 
Table 6.10 


Unconditional Standard Deviation of the Three Estimators of Bj 


Standard deviation 
Sample design 


By mi By sey By nw 

D1 0.111 0.111 0.111 

D2 0.106 Opia2 0.108 

D3 Ceili OnleZ Osn0T 
Table 6.11 


Unconditional Mean Square Errors of the Three Estimators of B)2 


Mean square errors 
Sample design 


By ml Bropwnar Brice 
D1 0.0130 0.0130 0.0121 
D2 0.0120 0.0192 0.0117 
D3 0.0125 0.0161 0.0123 


We conclude from these simulation studies that the new estimator Bit has performed 
well. When the assumptions of linearity and homoscedasticity are violated it appears to be 
robust across a variety of designs, to have good efficiency and to have reasonable conditional 
as well as unconditional properties. We know from previous studies that B12 wml performs as 
well as more conventional p-weighted estimators unconditionally and has far better conditional 
properties. The fact that in this study the new estimator Bais, apparently has better properties 
than the pwm/ estimator, which was chosen to represent the class of p-weighted estimators 
because of its performance in other simulation studies, suggests that it is an approach that could 
be considered in analytic studies of a small number of key parameters. 
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Some Recent Work on Resampling Methods 
for Complex Surveys 


J.N.K. RAO, C.F.J. WU and K. YUE! 


ABSTRACT 


Resampling methods for inference with complex survey data include the jackknife, balanced repeated 
replication (BRR) and the bootstrap. We review some recent work on these methods for standard error 
and confidence interval estimation. Some empirical results for non-smooth statistics are also given. 


KEY WORDS: Balanced repeated replication; Bootstrap; Jackknife; Stratified multistage designs; 
Variance estimation. 


1. INTRODUCTION 


Standard sampling theory is largely devoted to estimation of mean square error (MSE) of 
unbiased or approximately unbiased estimators Y of a population total Y. An estimator of 
MSE, or a variance estimator, provides us with a measure of uncertainty in the estimator Y. 
It isa common practice to assume that the estimator Y is approximately normally distributed 
and then use a two-sided confidence interval Y + z,/.s(Y) or a one-sided confidence interval 
(Y — z,5(Y),0) or(— o,Y + z,s(Y)), where s(Y) is the standard error of Y (i.e., square 
root of estimated MSE) and z, is the upper a-point of a N(0, 1) variable. These intervals cover 
the true total Y with a probability of approximately 1 — a in large samples, but the actual 
coverage probability could be significantly lower than 1 — qa in small samples or in highly 
clustered samples. For nonlinear statistics, such as ratios, regression or correlation coefficients, 
the well-known linearization (or Taylor expansion) method is often used (see Rao 1988 for 
detailed applications). Resampling methods, such as the jackknife, balanced repeated replica- 
tion (BRR) and the bootstrap, are also being used, and in fact several agencies in the U.S.A 
and Canada have adopted the jackknife method of variance estimation for stratified multistage 
surveys. An advantage of the linearization method is that it is applicable to general sampling 
designs, but involves the derivation of a separate standard error formula, s(6), for each 
nonlinear statistic, 6. On the other hand, resampling methods employ a single standard error 
formula for all statistics 6. However, the jackknife and the BRR methods are strictly applicable 
only to those stratified multistage designs in which clusters within strata are sampled with 
replacement or the first-stage sampling fraction is negligible. The bootstrap method of Rao 
and Wu (1987) works for more general designs, but it is computationally cumbersome and its 
properties for complex designs have not been fully investigated. 

This paper provides an account of some recent work on resampling methods for complex 
surveys. Some empirical results on jackknife and bootstrap variance estimation for non-smooth 
statistics, such as the median, under stratified cluster sampling and stratified simple random 
sampling are also given. 


!J.N.K. Rao, Department of Mathematics and Statistics, Carleton University, Ottawa, Ontario K1S 5B6. 


F.J. Wu, Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario N2L 3G1. 
m Yue, Social Survey Methods Division, Statistics Canada, Ottawa, Ontario K1A OT6. 
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2. STRATIFIED MULTISTAGE SAMPLING 


Large-scale surveys often employ stratified multistage designs with large numbers of strata, 
L, and relatively few primary sampling units (clusters), 7,(= 2), sampled within each stratum 
h. In fact, it is quite common to select n, = 2 clusters within each stratum to permit maximum 
degree of stratification of clusters consistent with the provision of a valid variance estimator. 
We assume that subsampling within sampled clusters is performed to ensure unbiased estimation 
of cluster totals. 17...) Slee, 7g) ee ee 

Let wy;,(> 0) be the survey weight attached to the k-th sample element (ultimate unit) in 
the i-th sample cluster belonging to A-th stratum. Often, the basic weights w,;, are subjected 
to post-stratification adjustment to ensure consistency with known totals of post-stratification 
variables. For example, the Canadian Labour Force Survey uses a generalized regression 
estimator to ensure consistency. We shall, however, ignore this complication in the present 
paper. An estimator of the population total Y is of the form 


= ry Whik Yhik > (2.1) 
(hikes 


where s denotes the sample of elements and y,,;, is the value of a characteristic of interest, y, 
associated with the sample element (hik)€s. We assume complete response on all items. 

It isa common practice to sample the clusters with probabilities proportional to sizes (pps) 
and without replacement to increase the efficiency of the estimators compared to pps sampling 
with replacement and to avoid the possibility of selecting the same cluster more than once in 
the sample. However, at the stage of variance estimation the calculations are greatly simplified 
by treating the sample as if the clusters are sampled with replacement and subsampling done 
independently each time a cluster is selected. This approximation leads to overestimation of 
variance of Y, but the relative bias is likely to be small if the first stage sampling fraction is 
small in each stratum. 


Writing Y as 
L 
Ys Supt (2.2) 


hl 
with 


Thi = y) (NpWhik)Vhik> Th = ye rni/Mns 
k i 
we note that the r;; are independent and identically distributed (iid) random variables with the 


same mean, Y;,,, and the same variance in each stratum /, under with replacement sampling 
of clusters. It therefore follows that an unbiased estimator of variance of Y is given by 


s(Y) = )) sin/tn, (2.3) 
h 
with 


nh 
2° S: 43 
(n, — 1)S;p_, = pS emery s 


i=] 


Under without-replacement sampling of clusters, s?( Y) will overestimate the true variance of Y. 
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We are also often interested in estimating the population distribution function, F(f), 
and the p-th quantile, 9 = F~'(p), 0 < p < 1; in particular, the population median 
6 = F~'(1/2). The survey estimator of F(¢) is given by 


PG fe Whik Chik » (2.4) 
(hikes 


where Wyix = Whik| Lis Whik are the normalized weights (YW, = 1) andap,z, = Lif Yan S 6, 
Apix = 0 otherwise. The sample p-th quantile is obtained as 


6 = F-!(p). (2.5) 


In practice, 6 is computed by first arranging the sampled values y,,, in an ascending order, say 
{¥ (nik) }, and then cumulating the associated weights w,,, until p is first crossed. The first y;ix) 
encountered after crossing p is taken as the sample p-th quantile, 6. Woodruff (1952) obtained 
confidence intervals for a quantile, and Rao and Wu (1987) obtained a simple variance estimator 
using Woodruff’s interval (see also Kovar, Rao and Wu 1988, Francisco and Fuller 1991). Shao 
(1991) considered general L-statistics, including the sample Lorenz curve and the Gini coeffi- 
cient, which are examples of smooth L-statistics, and the sample quantiles which are examples 
of non-smooth L-statistics. 

Many nonlinear parameters of interest, such as population means, ratios, regression and 
correlation coefficients, can be expressed as smooth functions, 6 = g(Y), of a vector of totals, 
Y = (¥, ..., ¥,)’, of suitably defined variates. An estimator of 6 is given by 6 = 2(Y). 
The linearization method may be used to estimate the variance of g( Y), under any complex 
design (see Binder 1983 and Rao 1988). 


3. RESAMPLING METHODS 


Resampling methods, such as the jackknife and the bootstrap, are widely used in the tid 
case. Suitable modification/extensions of these methods have also been developed to handle 
survey data involving stratification and clustering. We now give a brief account of some recent 
work on three such methods: jackknife, balanced repeated replication and bootstrap, in the 
context of stratified multistage sampling. 


3.1 Jackknife 


For simplicity, assume 6 = g(Y), a smooth function of the estimated total Y. Let 
(ej) = &( 4 zj)) be the estimator of @ obtained from the sample after omitting the data from 
the j-th sampled cluster in g-th stratum (j = 1, ..., m3 g = 1, ..., L), where 


5 n 
Ye) = 1% Wrik nik + ny {ee (3.1) 


(hik) és (gik)es~ & 
h#eg iAj 


Note that Yioj) is obtained by changing the weight of (gik)-th element to n,W,ix/ (Ng — 1), 
i # j, but retaining the original weights, w,;,, for h # g. A customary delete-1 cluster jack- 
knife variance estimator of @ is given by 
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P ng — | cmp ‘ 
s(0) = Ya; Sw. (3.2) 
get Saye 
Two variations of s3(6) are obtained by changing @ in (3.2) to bop.) =) 6 (oj)/Ng and 


6...) = Yeh Ogi /n, where n = ) ong. In the linear case, 6 = Y, all the jackknife variance 
estimators reduce to the ‘‘correct’’ variance estimator, s7(Y), given by (2.3). Rao and Wu 
(1987) made a second order analysis of the resampling variance estimators when 6 is expressed 
as a smooth function of totals, Y. Their main results on the jackknife are: (1) Different jack- 
knife variance estimators are asymptotically equal to higher order terms, as the number of 
strata, L, increases. (2) In the important case of n, = 2 for all h, the linearization variance 
estimator, s7 (6), and any jackknife variance estimator are asymptotically equal to higher 
order terms, indicating that the choice between the two methods should depend more on 
operational considerations than on statistical criteria. 

A drawback of the customary delete-1 jackknife method in the case of independent and 
identically distributed (i.i.d.) observations is that, unlike the bootstrap, it fails to provide a 
consistent variance estimator for non-smooth statistics, such as the median. Shao and Wu 
(1989), however, have shown that this deficiency of the delete-1 jackknife can be rectified by 
using a more general jackknife, called the delete-d jackknife, with the number of observations 
deleted, d, depending on a smoothness measure of the statistic. In particular, for the sample 
quantiles, the delete-d jackknife with d satisfying n”/d — Oandn — d — wasn — o leads 
to consistent variance estimators in the case of i.i.d. observations. This result suggests that 
a similar effect might hold in the case of delete-1 cluster jackknife for stratified multistage 
sampling since all the sampled elements in a sampled cluster (g/) are deleted in computing 
s3(6 ) given by (3.2). At present we are studying this problem theoretically, but we performed 
a limited simulation study which suggests that the delete-1 cluster jackknife variance estimator 
s3(6) might perform quite well. We now report the results of the simulation study for the 


A 


median, 6 = F~'!(%). 


For the simulation study, we generated stratified cluster samples {y,;,, kK = 1, ...,M; 
i=1,...,n,;h = 1, ..., L} employing the nested error model y,, = pw, + ap; + ep; with 
an; “ N(0,02,) and epi, “ N(0,02;,), where the cluster size, M is assumed to be equal for all 
clusters (hi), and the intra-cluster correlations, 07,/ (02, + 02,) = On, are assumed to be equal 
for all strata h (i.e., 0, = o). The normalized survey weights are given by w,;, with wp. = 
W,,/(n,M) and W,, denotes the relative size of stratum 4. The number of strata L(= 32), strata 
means, j1,, Variances 07, = 02, + 0%, and sizes W;, were chosen to correspond to real populations 
encountered in the US National Assessment of Educational Progress Study (Hansen and 
Tepping 1985). We generated 1,000 independent stratified cluster samples with n, = 2 for 
each selected combination (p,M) and then computed the bias and relative bias of the jackknife 
variance estimator, s3(@), for the median: Bias [s5(6) ] = ¥,53,(8)/1,000 — MSE(6), 
where s7,(6) is the value of s3 (6) for the ¢-th simulated sample (¢ = 1, ..., 1,000) and Rel. 
Bias [s3(6) ] = Bias [s3(6) ] /MSE (6). We calculated MSE (6) from an independent set of 
10,000 stratified cluster samples for each (p,M): MSE(6@) = ys (6, — 6.)7/10,000, where 
6, is the value of 6 for the ¢-th simulated sample, 6. = Y 6,/10,000 and tiem bo See: 

Table 1 reports the simulated values of bias and relative bias (in brackets) of the jackknife 
variance estimator for selected combinations of p and M. First, we note that for the special 
case of stratified simple random sampling (po = 0, M = 1), the relative bias is very large 
(116%) thus confirming the inconsistency of s3 (6) in this case. Second, we observe that both 
the bias and relative bias decrease as M increases for a given p. Moreover, for a given cluster 
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Table 1 


Bias and % Relative Bias (in Brackets) of Jackknife Variance Estimator for 
the Median Under Stratified Cluster Sampling (n, = 2, L = 32) 
and Selected Values of Equal Intra-Cluster Correlation, p, 
and Equal Cluster Size, M 


M 

; 1 10 20 30 50 

0 7.5(116) .28(41) .09(29) .04(15) .01(15) 
0.05 i 2207) .09(18) .05(12) .03 (8) 
0.10 2 .28(28) .10(14) .06 (9) 02 (3) 
0.20 = 31(22) .11(10) .08 (8) .03 (3) 
0.30 . 32(18) ia 07 (5) 01 (1) 
0.50 é .44(17) 15 (6) 11 (5) 04 (2) 


size M, the bias generally increases with p, but the relative bias in fact decreases because MSE (6) 
is increasing faster than the bias as p increases. It is indeed gratifying that the relative bias is 
no more than 10% for M = 30andp = 0.10 or M = 20 and p = 0.20. 


3.2 Balanced Repeated Replication (BRR) 


Balanced repeated replication (BRR) was proposed by McCarthy (1969) for the important 
special case of m, = 2 clusters per stratum. A set of R balanced half-samples (replications) 
is formed by deleting one cluster from the sample in each stratum. This set may be defined 
byaR x L design matrix (6,),1 <r< R,1 sh Ss Lwithé, = + 1or — 1 according 
as whether the first or second sample cluster in the A-th stratum is in the r-th half-sample, and 
y 67,6,’ = Oforallh # h’, i.e. the columns of the matrix are orthogonal. A minimal set of 
R balanced half-samples may be constructed from Hadamard matrices (L + 1 <= R= L + 4) 
by choosing any Z columns, excluding the column of + 1’s. 

Let 6” be the estimator of 6 obtained from the r-th half-sample. Note that 6” is obtained 
from 6 by changing the weight of (hik)-th element to 2w,,, or 0 according as the (Ai)-th 
cluster is selected or not selected in the half-sample. A BRR variance estimator of 9 is given by 


Sirr(0) = 


R 
by (6%) — 6)?, (3.3) 
r=1 


| 


Several variations of spp (6) are also available; for example, 6 may be changed to 6(-) = 
y.6'"/R. In the linear case, 6 = Y, all the BRR variance estimators reduce to the ‘‘correct”’ 
variance estimator, s?(Y), as in the case of the jackknife. 

Krewski and Rao (1981) established the consistency of s3(6) and sirr(6) for smooth 
statistics 6 = g(Y), as L increases. Rao and Wu (1985) made a second order analysis and 
showed that AUC), and s7(6) are not asymptotically equivalent to second order terms, 
unlike s4(6) and s7 (6). Shao and Wu (1992) established the consistency of Sarpr(0) for the 
quantiles, 6 = F~!(p). 

The BRR method has been extended to the case of n, = p > 2 clusters per stratum for 
p prime or power of prime (Gurney and Jewett 1975), but the number of replications, R, 
needed is much larger than in the case of n, = 2. In many survey designs n,,’s are not equal. 
To accommodate the general case of unequal n,, Gupta and Nigam (1987) and Wu (1991) 
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advocated the use of mixed-level orthogonal arrays of strength two for drawing balanced 
replicates, where n,, is the number of symbols in the /-th column of the array. Orthogonality 
of the array guarantees that the replicates drawn are balanced. Unlike the case of equal n,, 
the adjustment of survey weights is more complicated. A correct method was given by Wu 
(1991). From his formula (6), two separate adjustments should be applied to the sampled and 
unsampled units in each replicate. Simple algebra on Wu’s equation (6) shows that wyi, 1s 
changed to wi, = [1 + (n, — 1)7] Wiz or wi, = [1 — (nm, — 1)”7] Wy according as the 
(hik)-th element is selected or not selected in the replicate. (Note that wi, = 2 and wy, = 0 
for n, = 2). The remaining calculation of 6” and sire (O) are the same as in (3.3). Further- 
more, these modified survey weights can be applied to 6 = F~!(p) and more general 6 = T(F), 
where 7 is a functional of F. All we need to do is to change wy, in (2.4) to whix OF Whix 
according as the (hik)-th element is selected or not selected in the r-th replicate to get F'” of 
F for the r-th replicate, and 6“ = T(F"”). The calculation of the BRR variance estimator 
is the same as in (3.3). 

There are two problems with the use of mixed orthogonal arrays. First, the array size can 
be large for general n;. Second, orthogonal arrays do not exist for any combination of n,’s. 
A practical solution is to group the n; sample psu’s in stratum / into two to four groups of 
psu’s and then apply the method to the groups by treating the groups as units in the BRR 
method. This extension is called the grouped BRR method. As shown by Wu (1991), its effi- 
ciency loss can be relatively small, compared to the full BRR, if the groupings are done 
judiciously. For example, more groups are needed if n, is large and the units within the 
stratum are more heterogeneous. For n, = 2, 3 or 4, many mixed orthogonal arrays have 
been constructed (see, for example, Dey 1985 and Wang and Wu 1991). If 1; can only take 
2 or 4, saturated orthogonal arrays for any combination can be easily constructed as in Wu 
(1989). That is, the number of replications can be as small as possible. It is therefore possible 
to compile a large collection of mixed orthogonal arrays for practical use if 1, is restricted to 
2,3 0r 4. 

The BRR method and extensions considered thus far only take one unit (psu) per stratum 
for each replicate. If 1; is large, say more than 3, Sitter (1992) proposed the use of orthogonal 
multi-arrays to allow the number of resampled units per stratum to be greater than one. It may 
require fewer replicates and it can cover cases where orthogonal arrays of strength two are not 
available; for example, n, = 6. 


3.3 Bootstrap 


The bootstrap method for the iid case has been extensively studied (Efron 1982). Rao and 
Wu (1987) provided an extension to stratified multistage designs, but covering only smooth 
statistics 6 = g(Y). They required that, in order to have valid variance estimation in the case 
of small n,, some scale adjustment, similar to those in Section 3.2, is necessary. What they 
did not realize is that the scale adjustment should be made on the survey weights w,;, rather 
on the y,;, values directly, which is what they proposed. As a result, their method cannot be 
extended to cover the quantile @ = F~'(p). We now present a general method that covers 
smooth as well as non-smooth Statistics for arbitrary sizes, n,. It works as follows: (i) Draw 
a simple random sample of m,, clusters with replacement from the n,; sample clusters, 
independently for each h. Let mj; be the number of times (//)-th sample cluster is selected 
(%; mf; = m),). Define the bootstrap weights 


Whik = [EL — Cma/ (my — 1))*) + Crtn/ (my, — 1D) (ny hi Wake (34) 


Survey Methodology, December 1992 215 


If the (//)-th cluster is not selected in the bootstrap sample, mj; = 0 and the second term of 
(3.4) vanishes. If m,, is chosen to be less than or equal to n, — 1, then the bootstrap weights 
Whix are all positive if w,;, > O for all (hik)€s Calculate 6*, the bootstrap estimator of 0, using 
the weights wj, in the formula for 6. The bootstrap median, for example, is calculated as 
before using the normalized bootstrap weights Wj, = wiix/Lswiix, provided all wx, > 0. 
(ii) Independently replicate step (i) a large number, B, of times and calculate the corresponding 
estimates 071), ..., 9¢,). 


The bootstrap variance estimator s{o97(6) = Ex(0* — Ex6*)*, is approximated by 


f x eg oes : 
Spoor (6) = > )) [6% — ]?. (3.5) 
B 
b=1 
A variation of (3.5) is obtained by changing 6 to 0%.) = Yp8¢p)/B. In the linear case, Sep (0) 


reduces to the ‘‘correct’”’ variance estimator s7(Y). 


Rao and Wu (1987) obtained bootstrap-t confidence intervals for smooth functions, 
6 = g(Y), by approximating the distribution of t = (6 — 6)/s (0 ) by its bootstrap counter- 
part ** = (6* — 6)/s,;(6*), where s3(6*) is obtained from(3.2) with w,;, changed to wy,. A 
two-sided (1 —«)-level confidence interval for 6 is then given by {6 — di 554 OD) Ms tts,(9)}, 
where ¢7 and ¢j, are the lower and upper a/2-points of ¢t* obtained from the bootstrap 
histogram of f(), ..., f{g). One-sided confidence intervals can also be obtained from the 
bootstrap histogram. Empirical work by Kovar, Rao and Wu (1988) for smooth functions 
indicates that the bootstrap-t interval with m, = n, — 1 tracks the error rates in both the 
lower and upper tails better than the jackknife interval {6 — Zul2Sz(6),6- + Ze asa) }, but 
the total error rate is not distinguishable from the latter, i.e. , for two-sided intervals, they exhibit 
similar performance in terms of actual coverage probability. If a variance stabilizing transfor- 
mation can be found, such as the tanh —! transformation on the estimated correlation coeffi- 
cient, then the problem of uneven error rates in the two tails for the jackknife interval seems 
to be corrected. This suggests that the jackknife interval, or any other normal-theory interval, 
based on such transformations can be useful when the transformations are known, while the 
bootstrap provides an alternative when such transformations do not exist or are unknown. 

We now present the results of a limited simulation study on the performance of the proposed 
bootstrap method in the case of the median. Employing the Hansen-Tepping basic popula- 
tion 1 with L = 32 strata (see Kovar et a/. 1988, Sections 3 and 6 for details), we generated 
500 independent stratified simple random samples with 1, = 5 and then computed the relative 
bias and coefficient of variation (relative stability) of the Woodruff-based variance estimator 
with a = 0.1 (see Kovar et al. 1988, eq. (2.8)), the BRR variance estimator (3.3) and the 
bootstrap variance estimator (3.5) and its variation obtained by changing 6 to 0(.). We used 
mM, = n, — 1andn, — 3 and B = S00 bootstrap replicates for each sample, while the BRR 
replicates were obtained from an orthogonal array with 250 runs. The true MSE of 6 was 
approximated by selecting 10,000 independent stratified random samples. We also calculated 
the error rates in each tail (nominal rate of 5% in each tail) and standardized lengths of the 
normality-based confidence interval using the BRR variance estimator, the Woodruff interval 
and the bootstrap interval obtained from the percentile method using the bootstrap histogram 
of 071), ..., 0%) for each sample. 

Table 2 reports the simulated values of the relative bias, coefficient of variation, lower (L) 
and upper (U) error rates, and standardized lengths. First, we note that the bootstrap variance 
estimator (3.5) has a larger relative bias and a slightly larger coefficient of variation (CV) than 
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Table 2 


% Relative Bias and % CV of Variance Estimator and Error Rates 
and Standardized Lengths of Confidence Intervals 
(Nominal Level of 5% in Each Tail) for the Median Under Stratified 
Simple Random Sampling L = 32, np, = 5) 


Error Rate 
Method % Rel. Bias % CV —_—_—_—_—_ St. Length 
ig U 
Woodruff 4.2 47 4.2 5.6 0.997 
BRR 3.1 31 5.0 5.0 1.004 
Bootstrap*: 
w= A 12.6 a2 5.0 Sea 0.987 
(7S) (48) 
Aip=2 13.0 54 5.0) 4.8 0.988 
(7.8) (49) 


* Results for the variation of the bootstrap variance estimator are given in the brackets. 


its variation obtained by changing 6 to 0{.): Relative bias of 12.6% vs. 7.5% and CV of 52% 
vs. 48% for m, = n, — 1 = 4. On the other hand, the BRR variance estimator has the 
smallest relative bias (3.1%) and the smallest CV (31%), while the Woodruff-based variance 
estimator has a smaller relative bias (4.2%) and a comparable CV (47%). Secondly, the lower 
and upper error rates are close to the nominal level (5%) for the bootstrap and the BRR 
intervals, while the error rates are slightly uneven for the Woodruff interval (L = 4.2% and 
U = 5.6%). Finally, we note that the standardized lengths are roughly equal for all the 
methods. Overall, the bootstrap variance estimator and the bootstrap intervals based on the 
percentile method did not exhibit better performance relative to either the BRR variance 
estimator and the associated normality-based interval or the Woodruff-based variance estimator 
and the Woodruff interval. 
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An Estimating Function Approach to Finite 
Population Estimation 


HAROLD J. MANTEL! 


ABSTRACT 


Godambe and Thompson (1986) define and develop simultaneous optimal estimation of superpopula- 
tion and finite population parameters based on a superpopulation model and a survey sampling design. 
Their theory defines the finite population parameter, 0, as the solution of the optimal estimating 
equation for the superpopulation parameter 6; however, some other finite population parameter, ¢, may 
be of interest. We propose to extend the superpopulation model in such a way that the parameter of interest, 
¢, is a known function of 0x, say ¢ = f(@xy). Then ¢ is optimally estimated by /(6,), where 4, is the 
optimal estimator of 6,,, as given by Godambe and Thompson (1986), based on the sample s and the 
sampling design. 


KEY WORDS: Estimating functions; Generalized linear estimator; Finite population parameter. 


1. ESTIMATION OF A MEAN 


The problem discussed in this paper is the estimation of a finite population parameter such 
as the mean based on a sample survey. There is also a hypothesized superpopulation regression 
model relating the variable of interest to some known covariables. The objective is an estimation 
procedure which has good properties with respect to both the sampling design and the hypothe- 
sized model. The approach here is based on the work of Godambe and Thompson (1986). 

We suppose that we have a finite population of labeled individuals P = {i:i = 1, ..., N}. 
With each individual i is associated an unknown variable y; and a vector of covariables, x;. 
The vector x; may be known for all i¢P or only for 7 in the sample and the population mean 
Xx would be known. Letting E,,, denote expectation with respect to the superpopulation model, 
the model assumptions are: 


(i) y,; and y; are independent fori 4 / 


(ii) E,,(y;) = x/@ for some unknown real vector 6 

(iii) E,,(y; — x76)? = ov; i = 1, ..., .N, for known v; and some unknown o”. 
Following Godambe and Thompson (1986) we define a finite population parameter oy as 

the solution of the linearly optimal estimating equation 


N 
iy ja (y; — x7 8)x;/v; = 0, (1) 


i=l 


that is, 


A 


Bu = (XAVy'Xy)  XNV yn, (2) 


! H.J. Mantel, Social Survey Methods Division, Statistics Canada, Ottawa, Ontario, Canada K1A OT6. 
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where y; = ()1, ---5 Yn), Vx is a diagonal matrix with entries v,;, ..., vy, and Xy is a 
matrix with N rows, the ith row being x/. 

Now 6 is unknown. Godambe and Thompson (1986) defined and developed simultaneous 
optimal estimation of 8 and 8, based on the model and the sampling design. We will denote 
the data from a sample survey by x, = { (i, y;), /€s}. 

For simultaneous estimation of 8 and By we consider estimating functions h(x,, 6) such 
that E,(h) = g* in (1), where E, denotes expectation with respect to the sampling design. A 
function A* in this class is called optimal if for all other A in the class EmE,{hh"} — 
Piped Sak Tae ed ") is non-negative definite. Theorem 1 of Godambe and Thompson (1986) shows 
that the optimal function h* is given by 


h*(x,, 8) = es (y; — x/8)xi/7,V;, (3) 


i€s 


where 7; is the probability under the sampling design that individual / is included in the sample 
s. We will denote the root of this function by £,, that is, 


B, = CXois Ve 'X,) sh xe 1 a Ys» (4) 


where y, is the vector of y;s for i¢s, II, and V, are diagonal matrices with entries z; and v; 
respectively, iés, and_X, is the matrix with rows x/, ié€s. 

So far we have discussed only estimation of 8 or By. Our problem was to estimate jx, the 
population mean of the y;s. One possibility is to use a generalized regression estimator, 


Jorec = Xnhs + IZM '(y, — X85) /N, (5) 


where I, is a vector of 1’s whose length is the size of the sample s. This estimator is discussed, 
for example, by Sarndal, Swensson and Wretman (1992). The first part of the estimator 
gives good model properties while the second part gives good design properties. However, 
the model and design justifications of )gprcg in (5) do not depend on the particular form of 
8,, and there is no immediately apparent reason why #, in (5) could not be replaced by 
a purely model based estimator of 6. The design optimality of @, is apparently irrelevant. 

The estimator we will propose here more closely integrates the hypothesized model with the 
finite population parameter j,. Since By in (2) is optimally estimated by 8, in (4), functions 
of 8, are optimally estimated by the same function of B,. If ¥y = u™By for some vector u then 
we would estimate , by u7@,. Such a u exists if and only if Vy 1y is in the column space of 
Xvy, in which case, with Vyly = Xya, we may takeu = X}Vy'Xya/N = Xv. The idea 
then is that if YW1y is not in the column space of Xy, we will add it. In doing so we lose 
something of model efficiency, though the augmented model remains valid in light of the 
original model. We relax model efficiency to gain some sort of finite population relevance. 
As an interesting special case we note that when the model variances do not depend on / our 
approach leads to including an arbitrary constant term in the regression model. 

The approach taken here seems quite similar to that of Little (1983) who suggests model 
based estimation restricting attention to models that yield asymptotically design consistent 
estimators. Alternatively, Isaki and Fuller (1982) suggest restricting to designs for which the 
model based estimator is asymptotically design consistent. 
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2. COMPARISON TO THE GENERALIZED REGRESSION ESTIMATOR 


Let Wy be the design matrix for the augmented model, that is 
Wn = (Vuln, Xn). (6) 
For the discussion of this section we assume that Vly is not in the column space of Xy. 
Similarly, let W% be the augmented form of X,, and y, yx, and 7, be the augmented forms of 
6, Bx, and 6, respectively. 


For convenience, we will refer to our estimator of the population mean as the augmented 
regression estimator, 


JaREG = WNYs- (7) 


We first show that Yarga is also a type of generalized difference estimator. From (6), if 
u is a vector of appropriate length with the first entry equal to one and the rest zeros then 
Wyu = Vyly and Wu = V,1,. Then 


1, M4, = u WIVe TS |W, a TEN a ma ty] = LUG Vs 


and it follows that the second part of the generalized regression estimator in (5) with 8, 
replaced by ¥, is equal to 0. 


Secondly, let us compare Vargo In (7) tO Vorgc in (5). A few tedious calculations give us that 
YAREG = XB a Ceq/e>) tl Oe 63 /N, 
where 


W(Vnly — Xn (XPV 5 TS X,) XSI ',) 


Cy 
and 


= pEANE A ( Kl, fi KA Del ally 'X;) uk legal): 


Written in this way Yarro appears very similar to grec except for an adjusted weight for the 
second part. It does not seem possible to give an heuristic explanation of the weight (c;/c2). 
However, we note that c, is just the population sum of the residuals from a weighted regres- 
sion of the v;’s onto the x;’s based on the sample s, and c, looks something like a Horvitz- 
Thompson estimator of c,, except that the residuals also depend on the sample s. For large 
samples from large populations we would expect (c,/cy) to be close to 1. 

In comparing Yarro With greg we May say that Jarra is more design based and Jeregg is 
more model based. Of course, Jorg is design consistent, but Varro has also a finite sample 
design justification in that ¥, is the solution of an estimating equation which is design unbiased 
for the parameter defining equation of Bye Parameter defining equations are discussed by 
Godambe and Thompson (1984, 1986). 
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3. WARIANCE ESTIMATION AND CONFIDENCE INTERVALS 


A method of confidence interval construction which would be consistent with the general 
philosophy of estimating functions would be to construct an asymptotically multivariate normal 
pivotal based on h* and an estimator of its variance. Approximate confidence regions for 7, 
would then correspond to probability regions of the estimated multivariate normal distribu- 
tion of this approximate pivotal. However, we are not interested in 7, but in a non-injective 
function of ¥,,. We will adopt the more straight-forward approach of estimating the variance 
of Vargo directly. 

Sarndal, Swensson, and Wretman (1989) have investigated variance estimation for )orrG 
in (5) for the case that the second part is zero. As we have seen in section 2, our estimator 
JYarge IS precisely of that type. Their variance estimator may be written as 


Ve= VY Aysistisgic@s. (8) 


TESpanai].GS 


where Aj = (aj — 77;)/7, 7; 1S the design probability that both individuals 7 and / 
are included in the sample s, g;, is the ith element of the row vector wi(WJV. IL-1 W,) ~! 
WIV>',andé, = (y; — x7 4,)/1;. See Sarndal, Swensson and Wretman (1989) for a detailed 


discussion of the model and design properties of V, in (8). Note that Yargg in (7) may be 
written as YareG = Lies Sis ¥i/7; and 


VareG — Yn = Dy Sisein = WH(Ys — YN)s 


i€s 


where 6;y = (); — w/ ¥n)/7;. Now, with Wily = Wya, we have wf = 12V,\Vy'Wy/N= 
a’ WiVx '|WA/N, so that for large samples g;, will be near 1/N for i€s. The design variance 
of Varro is then approximately equal to 


> se AijEnEin/N?, 


IGP: jeEP 


where Aj; = (7;; — 7;7;), and this may be estimated by 


V, = yy > Ai j€js6)5/N?- (9) 


i€s i€s 


V, in (9) was considered in early work on the general regression estimator, for example, 
Sarndal (1981, 1982). Now V, in (8) may be thought of as a version of V, in (9) adjusted for 
the realized values of g;,, i¢s. Sarndal, Swensson and Wretman (1989) show that V, in (8), as 
well as being design consistent for the design variance of Vargo, is often model unbiased or 
nearly model unbiased for the model mean squared error Of Vargo. 

Now approximate confidence intervals for ¥, could be constructed based on a standard 
normal approximation to the distribution of (Varrg — ¥n)/{ V,} '/2 The justification of this 
procedure, from both a design and a model point of view, is asymptotic and the question of 
its appropriateness for particular finite samples must be addressed. One possibility is to compare 


Survey Methodology, December 1992 223 


a set of confidence intervals obtained by this procedure to a set of purely model based intervals 
based on a further assumption of normality of errors and a t-statistic. If the two sets of intervals 
are wildly different there may be reason to doubt the validity of the jointly model and design 
based intervals, but more work is needed before this question can be answered satisfactorily. 

An alternative approach to variance estimation in this framework is given by Binder (1983). 
The design variance of h* as an estimator of g* at Y, could be estimated using standard design 
based techniques substituting 7, for 7,, and then the variance of 7, as an estimator of Yj 
would be derived from a Taylor linearization of h* about 7,. Taylor linearization could again 
be used to derive an estimator of the variance of a function of ¥, as an estimator of the same 
function of ¥y. 


4. AREAS FOR FURTHER RESEARCH 


We have seen how the approach described here could be used for the estimation of finite 
population means or, more generally, for functions of linear regression parameters. It is natural 
to wonder whether and how the approach may be adapted to the estimation of other types of 
finite population parameters such as distribution functions and quantiles or to estimation for 
small areas. 

Consider the special case of estimation of a distribution function at one point. There are 
two possible approaches to incorporate covariate information into a model. The first is to model 
the probability explicitly as a function of the covariates, an example is the logistic model. A 
second approach, which is common in the context of estimating a distribution function, as 
in Chambers and Dunstan (1986), Rao, Kovar and Mantel (1990), and others, is to model the 
residuals from a regression of the observed variable onto the covariables as being independent 
and identically distributed from some unknown distribution. The present approach requires 
that the parameter of interest be a function of the finite population parameter. Can this 
approach be adapted for the estimation of distribution functions or quantiles? 

Another important problem in survey sampling is small area estimation, that is estimation 
of totals, means or proportions for subsets of the finite population. A good review is given 
in Platek, Rao, Sarndal and Singh (1987). An obvious adaptation of the approach of Section 1 
is to apply it separately within each domain of interest, what might be described as post-stratified 
generalized regression estimation. Note that this approach would require the totals of the 
covariates for each domain of interest. A very common approach in small area estimation is 
to borrow strength across areas via a model relating small areas to each other and to some 
covariates. A good review is given in Singh, Mantel and Thomas (1991). A very fruitful 
approach has been the empirical Bayes estimation based on random effects models which was 
introduced by Fay and Herriot (1979). Liang and Waclawiw (1990) discuss estimating functions 
for empirical Bayes models. Can the idea of modelling to borrow strength across small areas 
be formulated in such a way that the parameters of interest become functions of a population 
parameter? 
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Maximum Likelihood Estimation from Complex 
Sample Surveys 


ABBA M. KRIEGER and DANNY PFEFFERMANN! 


ABSTRACT 


Maximum likelihood estimation from complex sample data requires additional modeling due to the 
information in the sample selection. Alternatively, pseudo maximum likelihood methods that consist 
of maximizing estimates of the census score function can be applied. In this article we review some of 
the approaches considered in the literature and compare them with a new approach derived from the 
ideas of ‘weighted distributions’. The focus of the comparisons is on situations where some or all of the 
design variables are unknown or misspecified. The results obtained for the new method are encouraging, 
but the study is limited so far to simple situations. 


KEY WORDS: Design adjusted estimators; Ignorable and informative designs; Pseudo likelihood; 
Weighted distributions. 


1. INTRODUCTION 


Survey data are often used for analytic inference about model parameters such as means, 
regression coefficients, cell probabilities etc. The models pertain to the population data and 
are therefore referred to as the census models. The problem in applying ‘classical’ maximum 
likelihood methods to survey data is that the model holding for the sample can be very different 
from the model holding for the population due to sample selection effects. 

In order to illustrate the problem and some of the solutions proposed in the literature, consider 
the following simple example. A population U is made up of N units labelled {1, ..., N}. 
Associated with unit /is a vector ( Y;,Z;) of independent measurements drawn from a bivariate 
normal distribution with mean pw’ = (py,uz) and variance-covariance (V — C) matrix 


o Ys Oyz 
y= ate 
OyzZ, 97 
The values (y;, z;) are observed for a sample s of mn < < N units selected by a probability 


sampling scheme. It is desirable to estimate py and of. We consider three cases distinguished 
by the selection process and data availability. 


Case A - The sample is selected by simple random sampling with replacement and only the 


values { (y;, Z;), /€s} are known. Denoting the sample labels as {1, ..., m}, we have that 
Vy; OnOR) Ve A N(uy, oy) yielding 
in 
Tp SRS a OT ey Ke (1.1) 


as the MLE of py and o¥. Clearly Ey (fy) = wy and Ey{ [n/(n — 1)]6}} = o, where Exit} 
defines the expectation under the model, with the sample units held fixed. 


' Abba M. Krieger, Department of Statistics, University of Pennsylvania, Philadelphia, PA 19104. Danny Pfeffermann, 
Department of Statistics, Hebrew University, Jerusalem 91905. 
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Case B - The sample is selected with probabilities proportional to z; with replacement such 
thatateach draw k=... et Pr a= Pies pag z;. The data known to the analyst 
are {y,,z,,/€s} and {Z,41,, ..., Zy}. Suppose that Corr(Y,Z) > 0. This implies that 
P(Y; > wy | i€s) > 1/2 since the sampling scheme tends to select units with large values of 
Z and hence large values of Y. Clearly, the estimators defined in (1.1) are no longer MLE in 
this case. 

The situation just described corresponds to the ‘classical’ example of missing data often 
analyzed in the literature (Anderson 1957). The MLE of py and of are now 


jy = ¥, + D(Z — Z,); GF = s} + O° (SZ — 57), (b2) 


where Z = » Mh Zi/N, ig iy zal i b= y ial (Vi + Vs) (Z; 7 Rodilads al (Z; rT aa 
S% = 5 4 (z; — Z)7/N and st = ¥ %,(z; — Z,)*/n. Notice that the effect of the sample 
selection can be dealt with in this case by modeling the joint distribution of the response variable 
Y and the design variable Z. The sample selection process is then ignorable (see section 2.1). 


Case C - Same as Case B but only the sample values { ();, z;), /¢s} and the sample selection 
probabilities {P,, i¢és} are known. Even though the values of z;,,i =1, ..., N are known at 
the sampling stage, it is often the case that information on the design variables or the inclusion 
probabilities for units outside the sample is not included in the files released to analysts 
performing secondary analysis. 

The estimators defined by (1.2) are no longer operational in this case since the population 
mean and variance of Z are unknown. For large populations, however, such that Z= constant, 
an approximate MLE estimator of py is obtained as p# = y, + b*(1/N — P,) where P, = 
y "_, P/nand b* = ¥",(y; — ¥,) (BR — B)/Y 1 (P; — PB)’. The rationale for p# is that 
P, = Z,;/NZso that for Z = constant, ( Y;, P,) is bivariate normal with P = 5 P/N = 1/N. 
This estimator is an example of using the sample selection probabilities as surrogates for the 
design variables when information on the latter is incomplete, as recommended in Rubin (1985). 

A possible way to obtain approximate MLE under Case C is to follow what is known in 
the literature as the pseudo likelihood approach. We describe the approach in more detail in 
section 2, but it basically consists of maximizing a design consistent estimator of the census 
score function, that is, the score function that would have been obtained in the case of a census. 
The latter is unaffected by the design. Application of this approach yields, under Case C the 
estimators 


iy = Jie awry). hywi; 6 = sp =) Lywiy - ee Pa1Wis (1.3) 


where w; = (1/nP;). Since ¥,, and are design consistent for Y = ¥ ™_,y;/N and S? = 
y _1(y; — Y)°/N respectively, they are also consistent for wy and o} in the sense that 
plim, — co.V—o (Vps> 9) = (ny; oy). 

In this article we discuss a different approach for maximum likelihood estimation that is 
operational in principle even when the only information available to the analyst is the sample 
data. The method is derived from the theory of weighted distributions (Rao 1965, 1985, Patil 
and Rao 1978) and it utilizes the sample selection probabilities. The method is illustrated for 
the case of normal distributions with two different sampling designs and is shown to perform 
well in these cases. Another apparent advantage of the proposed approach emerging from the 
empirical study is that it is not very sensitive to misspecification of the design variables. 
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In section 2 we review the different approaches for MLE from survey data considered in 
the literature. Section 3 outlines the basic steps of the new approach. The empirical study is 
described and summarized in section 4. Section 5 contains concluding remarks. 


2. REVIEW OF APPROACHES CONSIDERED IN THE LITERATURE 


In this section we review briefly the approaches considered in the literature for MLE or approx- 
imate MLE from survey data. To better understand the complexity of the problem, we first 
discuss the notion of ignorable sampling designs. For a more detailed review of maximum likelihood 
and other approaches for analytic inferences from sample surveys see Pfeffermann (1993). 


2.1 Ignorable and Informative Sampling Designs 


Let Z’ = (Z,, ..., Zq) represent K design (auxiliary) variables used for designing the survey 
and denote by Z = (Z, ..., Zy)’ the N xX K matrix of measurements on Z so that z; is the 
vector associated with unit 7. The design variables may include strata indicator variables and 
quantitative measurements of cluster and unit characteristics. Let Y’ = (Yj, ..., Y,) represent 
the survey response variables. We assume for convenience that Y is separate from Z although 
as we mention below and consider in the empirical study, the sample selection probabilities 


may depend on the Y-values directly. The matrix Y = (), ..., yy) of the response variables 
values can be decomposed as Y = [Y;,Y;] where Y, = {y;,/¢s} and Y; = {y;,i¢s}. Let 
I = (i, ..., In)’ be a vector of sample inclusion indicators such that J; = 1 for iés and 


I; = 0 otherwise. 

The basic problem of MLE from complex survey data, as illustrated in the introduction, 
is that in general, f( ¥,;A*) # | J(Y;\)dY; where the symbol f(- ; -) defines probability density 
functions (pdf). As further illustrated in the introduction, this problem can sometimes be 
resolved by modeling the joint distribution of Y and Z. Thus, suppose that the values of Z 
are known for every unit in the population and that Y is observed for only the sample units. 
The joint pdf of all the available data can be written as 


I(%51;259,0,0) = | £(%) Vs | Z50:)PU | Y,Z301) 8(Z30)d¥;. (2.1) 


Ignoring the sampling selection in the inference process implies that inference is based on the 
joint distribution of Y; and Z, that is, the probability P(J | Y, Z; e,) on the right hand side 
of (2.1) is ignored. Hence the inference is based on 


OVS: Zea) =\ LY bn 2 0Zie) ad Ys: (2.2) 


The sample selection is said to be ignorable when inference based on (2.1) is equivalent to 
inference based on (2.2). This is clearly the case for sampling designs that depend only on the 
design variables Z, since in this case P(J | Y, Z;e,;) = P(J | Z;o,). The exact conditions for 
the ignorability of the sample selection process are defined and illustrated in the articles by 
Rubin (1976), Little (1982) and Sugden and Smith (1984). 

The complications of MLE from complex survey data based on (2.1) or (2.2) are now 
apparent. First and foremost, it requires that all the relevant design variables be identified and 
known at the population level. As often argued in the literature, (see Pfeffermann 1993 for 
references), this is not necessarily the case. Secondly, it requires that the sample selection is 
ignorable in the sense discussed above or alternatively that the probabilities P(J | Y,Z;o) be 
modeled and included in the likelihood. Finally, the use of MLE requires the specification of 
the joint pdf f(Y,Z;0,¢) = f(Y | Z;0:)a(Z; ¢). 
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2.2 Exact MLE Based on Factorization of the Likelihood 


Factoring the likelihood in the case of multivariate normal data was first suggested by 
Anderson (1957). The factorization is possible when the observed data have a nested pattern, 
that is, the set of survey variables X;, ..., X, can be arranged such that X; is observed for all 
units where X;,, is observed, 7 = 1, ..., (9 — 1). Extensions to other distributions and 
more general data patterns are given in Rubin (1974). Holt, Smith and Winter (1980) apply 
the ideas to MLE of regression coefficients from complex survey data. 

Suppose that the sample selection is ignorable so that inference can be based on the joint 
distribution f( ¥;,Z;0,6) = f(¥; | Z;8:) g(Z;¢) . The likelihood can be factored accordingly as 


L(9,0;¥;,Z) = L(9;¥; | Z) L(93Z). (2.3) 


Assuming that the parameters 6, and ¢ are distinct in the sense of Rubin (1976), MLE of 6, 
and ¢ can be calculated independently from the two components. 

Application of (2.3) to the case where ( Y/,Z/ ) are multivariate normal yields the following 
MLE for py = E(Y) and Yy = V(Y) (Anderson 1957). 


iy =¥, + B(Z -—%); Vy = sy + BlSzz — 5zz)B’, (2.4) 


where (J;,Z5) = Y j=10%2i)/N, Z= yi Z/NSzz = Liz — Z) (zg — Z)'/N, Szz = 
yur (gi — 2s) (& — 2)’ /n and B= Y 7.10%; —9,) (gi = %5)'Szz IN. 

The MLE of the coefficient matrix B,, of the multivariate regression of Y,; on Y> where 
Y’ = (Y{,Yz) is obtained straightforwardly from (2.4). Thus, if 


Te 
ee ; 
eno Py 


where 
De CONT )e bee helio Valea Baie ee el nonce De eo ial aa 
For the explicit expression of B,, see Holt, Smith and Winter (1980). 


2.3. Design Adjusted Estimators (DAE) 


Assume that the sample selection mechanism is ignorable. Let €y(@;Y) denote the log 
likelihood for 6 that would be obtained in the case of a census. Denote by hy ( Y | Z, Y,; 95) 
the conditional distribution of Y given Zand Y,and let E,,,(- | Z,¥Y;) define the expectation 
operator under /iy. The DAE @np of 8 as proposed by Chambers (1986) is defined as 


Enyl — @v(Ovv) | Z,¥] = min{E,.[— ey(8) | Z,%139€0}. (2.5) 


Notice that the expectation Eyp(@) = Ep, [&y(@) | Z,¥,] depends on the vector parameter 
6, of the conditional distribution f( Y | Z;0,). The estimator yp of (2.5) is computed by 
substituting 6; for 6; where 6, is the MLE of 0, obtained from the data ( Y,,Z). 
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Simple algebra shows that for the multivariate normal model considered in section 2.2, the 
DAE of wy and Yy are the same as the MLE defined by (2.4). A possible advantage of this 
approach, however, is that it can be applied to other loss functions. 


2.4 The Pseudo Likelihood Approach 


The prominent feature of this approach is that it utilizes the sample selection probabilities 
to estimate the census likelihood equations. The estimated equations are then maximized with 
respect to the vector parameter of interest. No information on the values of the design variables 
is needed, although as illustrated in the empirical study, knowledge of these values at the 
population level can be used to improve the efficiency of the estimators. 

Suppose that the population values Y; are independent draws from a common distribution 
Jjiocorandlet i. (05 y= 0), Ae log f(¥;;0) define the census log likelihood function. Under 
some regularity conditions, the MLE, 6, solves the equations 


Uy —sdiy.(8; Y)/de — yay) = 0, (2.6) 


where “‘d’’ defines the derivative operator and u(6,y;) = dlog f(Y;30)/dé. The pseudo MLE 
of 6 is defined as the solution of U(@) = 0 where U(9) is a design consistent estimator of U(@) 
in the sense that plim,— Vol U(6) — U(@)] = 0 for all 6€O. The commonly used 
estimator of U() is the Horvitz-Thompson (1952) estimator so that the pseudo MLE of @ is 
the solution of U(@) = ¥ "_,w*u(9; yi) = 0 where for selection without replacement 
w} = [1/P(iés)] and for selection with replacement w* = (1/nP,). 


For the multivariate normal model, the pseudo MLE of py and Sy are 


fy =D feiwiys |X Peawhs Vy = Lhiwth - fy) @ - iy) aw (2.7) 


The pseudo MLE of the matrix coefficients B,> is obtained as Bj) = ¥ 1 ¥ x. 


Various examples for the use of this approach under different models can be found in Skinner 
et al. (1989). See also Binder (1983), Chambless and Boyle (1985), Roberts, Rao and Kumar 
(1987) and Pfeffermann (1988). 

Information on auxiliary design variables known at the population level can be used to 
improve the efficiency of the design estimators of U(@). The ‘‘probability weighted MLE”’ 
as proposed by Nathan and Holt (1980) and by Smith and Holmes (Skinner ef a/. 1989, 
Ch. 8) are examples of the use of the population values of the design variables. The estimators 
have the same structure as the exact MLE derived from (2.4) but with unweighted sample 
statistics replaced by weighted statistics. For example, (j,,Z,) in (2.4) are replaced by 
Y wi (,2;)/ LD 17, with similar substitutions for the other expressions. 

An important property of pseudo MLE is that they are in general design consistent for the 
population quantities that would be obtained by solving the corresponding census likelihood 
equations, irrespective of whether the model is correct and/or whether the sampling design 
is informative. See Pfeffermann (1993) for the implications of this property with references 
to other studies. Other theoretical properties of pseudo MLE are studied by Godambe and 
Thompson (1986). 
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3. MLE DERIVED FROM WEIGHTED DISTRIBUTIONS 


3.1 General Formulation 


The weighted pdf of a random variable X” is defined as 
ff" (x) = w(x)f(x)/w, (3.1) 


where f(x) is the unweighted pdf and w = Jw(x)f(x)dx = E[w(X)] is the normalizing factor 
making the total probability equal to unity. Situations leading to weighted distributions occur 
when realizations x from f(x) are observed and recorded with differential probabilities w(x). 
The expectation w is then the probability of recording an observation and f” (x) is the pdf of 
the resulting random variable X”. 

The concept of weighted distributions was introduced by Rao (1965). Patil and Rao (1978) 
discuss various practical situations that give rise to pdf’s of the form (3.1). One special case 
that occurs in many applications is when w(x) = | x| where| x | is some measure of the size 
of x. The pdf obtained in this case is called ‘size biased’ or ‘length biased’. The properties of 
that distribution under a variety of densities f(x) are examined in Cox (1969) and Patil and 
Rao (1978). Estimation of weighted distributions is considered by Vardi (1982). 

How can the concept of weighted distributions be utilized for analytic inference from complex 
samples? Consider as before a finite population U ={1, ..., N} with random measurements 
X(i) = x/ = (7/,z/) generated independently from a common pdf h(x;6) = fQ; | 2391) 
2(z;;¢). Suppose that unit 7 is sampled with probability w(x;;@) that depends on the 
measurements x; and possibly also on an unknown vector parameter a. Denote by X7" the 
measurements recorded for unit /¢s. The pdf of X;" is then 


hee) 
— 
| 


= f(x; | ies) = Plies| X(i) = x) A(xs8)/P (ies) 


w(xi5 ah (0535)| Jw xpayh (x56) dx. (3.2) 


Analytic inference focuses on the vector parameter 6 or functions thereof as the target 


parameters; Lets ‘= [Toi ny "define avsampieror fixed ‘size <= —<— Neselectcds win 
replacement such that at each draw k = 1, ..., 1, P(jés) = w(xj3a),j = 1, ..., N. The 
joint pdf of {X;",7 = 1, ..., n} is then IV7_,h"(x;3a,6) so that the likelihood is 

L(8;X,,s) = const x IT, h(wsd)/[Jwose)hsd)dx]", (3.3) 
where XJ = [X,, ..., X,]. The likelihood (3.3) has the following properties: 


(1) It is defined in terms of the vector parameter 6. This has an advantage over the use of the 
factorized likelihood (2.3) where 6 does not enter the likelihood directly. 


(2) It is a function of the selection probabilities w(x;;@) that enter into the denominator. 


(3) The likelihood relates to the conditional distribution of the sample data given the units 
in the sample. This is different from the likelihood derived from the pdf in (2.1) which 
is the joint pdf of the sample data and the vector / of sample indicators. An example of 
the use of the latter pdf in conjunction with weighted distributions for MLE is given in 
Godambe and Rajarshi (1989). 
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(4) The use of the likelihood (3.3) requires a definition of the joint pdf /(x;6) holding in the 
population and a specification of the relationship between the sample selection probabilities 
and the variables observed for the sample. The need to define the population pdf is common 
to all of the approaches for MLE proposed in the literature. The specification of the func- 
tions w(x) is unique to the present approach. This step can be carried out however by 
modeling the empirical relationship in the sample between the selection probabilities and 
the observed measurements. Having identified a suitable model, the probabilities w(x,q) 
can be estimated from the sample and the estimates can be substituted into the likelihood. 
In what follows we consider two examples which are analyzed empirically in section 4. 


3.2 Examples 


We assume the model considered in section 2 in which X/ = (Y/,Z;) are independent 
realizations from a multivariate normal distribution with mean py = (uypmz) and V — C 
matrix 


Mt, vy Vz 


vizys Yizz 


Consider the following sampling designs: 


D1 - PPS selection with replacement: Let 7; = aj Y; + a 3Z; define a single design variable 
and suppose that the sample is selected with probabilities proportional to the 7-values such 


thanat each draw k = 1, ..:..7;-P (tes) = L/INT =y 1, aN where 1. = yy t)/N. We 
assume that N is large enough so that the difference between TJ and 4; = E(T) can be 
ignored. The coefficients a = (aj,a3z) are fixed. In special cases a; = 0 hence T is a func- 


tion of only the auxiliary design variables Z or a2 = 0 in which case T is only a function of 
the response variables Y. Suppose as before that it is desirable to estimate the mean py and 
the V — C matrix )),,, or functions thereof. 

When a, = 0 and T is known for every unit in the population, one can estimate the 
unknown parameters using the factorization (2.3). The corresponding MLE are given in (2.4) 
with Z replaced by 7. Suppose however that the only information available to the analyst is 
the sample values x/ = (7,z/), 7 = 1, ..., m and the sample selection probabilities 
P; = t;/NT. Under the assumption T = 17, the likelihood for [wx, Y ,,] can be written using 
(3.3) as 


L (ji, es A538) lg (a’ x) O(Xsux Y xx)/ (at by + a3 pz)", (3:3) 


where $(X3nx; a) is the normal pdf with mean py and V — C matrix ie The likelihood 
in (3.5) is a function also of the unknown vector coefficients ~. However, the values of a can 
actually be found up to a constant c (which cancels out in the likelihood) by regressing the sample 
selection probabilities P; against a. 

In the simulation study described in section 4, we consider the case where not all the design 
variables are known even for the sample units. Thus, suppose that Z/ = (Z,;,Z;) and that 
the data available to the analyst consist of the selection probabilities P, i = 1, ..., mand 
the observations {x*’ = ()/,z,;),i = 1, ..., m}. The likelihood (3.3) is now 
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L (ut, Y xxs X$,5) = hy wat )o(xtsuk Yxx)/(w*)", (3.6) 


where w(x;‘) are the selection probabilities expressed as functions of x*. Clearly, the 
probabilities w(x;*) are not fully determined by the values x;* unless a.) = 0. Assuming 
normality 


W(X,a) = a6 + af’y; + FZ; + €;, (3.7) 


where {e;} is white noise. Thus, the likelihood (3.6) can be approximated by substituting 
w* (x*) = ap + af’y; + oFz1; for w(x*). The values of a* = (ag, af,az)’ can be estimated 
from the regression (3.7) and then substituted into the likelihood. 


D2 - Stratified sampling with 7 as the stratification variable: Suppose that the population U 
is divided into L strata U,, ..., U; of sizesN), ..., Nz, ¥ ae N, = N, based on the ascending 
values of T. Consider a simple random stratified sample of sizen = ¥ /_, n;, selected without 
replacement with fixed sample sizes {n,}. The weighted pdf of X;”, the measurements 
recorded for unit /€s is in this case [compare with (3.2) ] 


Pih(x36)/w if ts t/ 
Pix) /we it be Sh ie 
h” (x50,8) = f(x; | i€s) = (3.8) 
Pi h(x36)/w if t@—>) = ¢; 
where P, = (n,/N,) and for {N,} sufficiently large, the probability w = P(ieS) can be 
closely approximated as 
() 1) as 


wienPR(iéae P,\Y dl tdtHoeenP, \, 3 o(t)dt + P, Je edt, — GB.9) 
Pe ia 


where $(¢) denotes the normal pdf of 7. 
Suppose that the strata are large enough so that selection within the strata can be considered 
as independent. Define wz = E(T) = a’py, o% = Var(T) = a’ Vxx @ and let &, = 


Mag, o(t)dt. For given boundaries {t‘”")} and the vector coefficients a, the likelihood for 
6 can be written as 


L(6; X,,s) = const x I1f_,h(x36)IT} = 1 Pr "/ 


{P, &; + Y fz2 Pal, — Sp—1] + Pell — &,_-1]}”. (3.10) 
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Hausman and Wise (1981) use a variant of the likelihood (3.10) for estimating the vector 
of regression coefficients in a situation where the strata boundaries are determined by the values 
of the dependent variable. They assume that the strata boundaries are known, but allow the 
selection probabilities within the strata to be unknown in which case they are included in the 
set of unknown parameters with respect to which the likelihood is maximized. 

In many practical situations, the strata boundaries are unknown and have to be estimated 
from the sample data. When the data include the values {7;, /= 1, ..., m}, the vector a can 
be estimated from the regression of f; on x;, as in the PPS example discussed before. 
Furthermore, if (t(1) S ... S t(n)) are the ordered values of the fs, the strata boundaries 
can be estimated as, t" = (tin) + titty) --- C2) = “(tert) + tene4y) where 
n* = Y f=) n,. Substituting these estimates into (3.10) yields an approximation to the 
likelihood which can then be maximized as a function of 6. 

The situation is more complicated when the values ¢; are unknown even for units in the 
sample. In the simulation study we attempt to deal with this problem by predicting f; using 
Fisher’s Linear Discriminant Function, that is, specifying the vector coefficients & to be such 
that it maximizes the ratio of the between groups sum of squares to the within groups sum of 
squares of linear combinations a’ X;. The groups are the strata. Once the predictors i= aX; 
are formed, the strata boundaries are estimated as in the previous case but with /; instead of 
t;. Also, pr = &’wy and 6y = a’ Viyy &. Substituting these estimators in (3.10) yields an 
approximation to the likelihood which can be maximized with respect to 6. 

As in the PPS example, the likelihood (3.10) can be modified to the case where only some 
of the design variables are known or observed. Maximization of the modified likelihood is 
carried out following the same steps as above. 


4. SIMULATION RESULTS 


4.1 General 


In order to illustrate and compare the performance of the various MLE procedures described 
in this paper, we ran a small simulation study which consists of two stages. In the first stage 
we generated a single finite population of size N = 8,000 such that x; = (¥1),.¥2j.21),Z2)5 
i = 1, ..., 8,000 are multivariate normal. In the second stage we selected independent samples 
of sizen = 800 using the two sampling schemes described in section 3.2 with two different 
definitions for the design variable. The number of samples selected in each case was 300. We 
computed the various estimators for each of the samples based on the available sample data 
and then computed the empirical bias and root mean square error (RMSE) over the selected 
samples. In order to study and compare the conditional properties of the estimators considered, 
we Classified the 300 samples selected in each case into 10 groups, based on the ascending values 
of the sample mean of the design variable and computed the bias and RMSE within each of 
the groups. In what follows we describe the various stages in some more detail. 


4.2 Generation of the Population Values and Sample Selection Schemes 


Values of z,; and z>; were generated independently from a normal (20, 107) distribution. 
Values y,; were generated as y;; = 2; + Zo; + €1;3 €1; ~ N(O, 10’). Values y>; were generated 
aS Yo, = Vy + 0.52; ar 0.525; SP ES (yt N(O, 205). 


234 Krieger and Pfeffermann: ML Estimation from Complex Sample Surveys 


We employed the two sampling schemes described in section 3.2 using two different definitions 
for the design size variable. (1) t; = 0.5(z,; + Z;) and (ii) t; = 0.25(y1; + yo; + 21; + 2). 
Thus, selection based on the first design variable satisfies the ignorability conditions defined 
in section 2.1, provided that the data for (Z,,Z,) are known for the entire population. When 
these data are only known for the sample, the sampling design is ignorable only with respect 
to the conditional distribution /()1,¥2 | Z;,Z2). When selection is based on the second design 
variable, the sampling design is informative. 

For the stratified selection D2, we generated eight equal sized strata defined by the ascending 
values of the size variable. The sample sizes within the strata were such that they increase with 
increasing values of the ¢/s. 


4.3 Estimators Considered 


The parameters estimated in our study are the mean vector and the V — C matrix of the 
marginal distribution of ( Y,, Y,). We consider seven different estimators for the design D1 
and nine estimators for the design D2. See section 3.2 for description of the computations 
involved in the derivation of the various estimators. 


DESIGN D1 

ML (Z,,Z>) — The exact MLE for the case where the design is ignorable, (equation 2.4). 

WML (Z,,Z>) - The estimators obtained from ML (Z,,Z,) by replacing the unweighted 
sample statistics by probability weighted statistics (see the discussion 
below equation 2.7). 

ML(Z,) - Same as ML (Z,,Z>) but with Z, as the only design variable so that 
Z = in 

WML (Z,) - Same as WML (Z,,Z,) but with Z, as the only design variable. 

CRE - The classical pseudo likelihood estimators (equations 2.7). 

WDML (X*) — The (weighted distribution) estimators obtained by maximization of the 


likelihood in (3.6). 
WDML (X*,Z,) - The estimators obtained by maximizing the likelihood in (3.6) but with 
the mean and variance of Z, fixed at their population values. 


DESIGN D2 


The first 5 estimators are the same as the estimators for the design D1. The other 4 estimators 
are defined as follows: 


WDML (X*) - The estimators obtained by maximizing the likelihood (3.10) with the a* - 
coefficients [(equation (3.7)] estimated by the linear discriminant function. 


WDML (X*,Z,) - Same as WDOML(X*) but with the mean and variance of Z, fixed at 
their population values. 

WDML (X*,t,) - The estimators obtained by maximizing the likelihood (3.10) when the 
values tf, = (t), ..., ¢,) are known for units in the sample. 


WDML (X*,t,,Z,) - Same as WDML (X*,t,) but with the mean and variance of Z, fixed at 
their population values. 
It should be emphasized that the estimators derived based on the weighted distributions are 
not really MLE because of the approximations involved in the maximization procedures as 
described in section 3.2 (see also comment 2 below). 
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Comments 


(1) 


(2) 


The estimators we consider can be classified according to the sample and population data 
they use and according to whether the design variables are correctly specified and the 
ignorability conditions are met. Thus, the estimators ML(Z,,Z,) and WML (Z,,Z>) use 
the population values of Z; and Z, and the sample values of Y,; and Y>. As mentioned in 
section 2.4 and further discussed in Pfeffermann (1993), the use of WML(Z,,Z,) is to 
protect against possible model misspecifications or informative sampling schemes. The 
estimators ML (Z,), WML(Z,), WDML (X*,Z,) and WDML (X*,t,,Z,) use the known 
population data for Z, but not the data for Z, even for the sample units. The use of these 
estimators corresponds to situations where the design variables are misspecified or the values 
of some of them are unknown. The estimator WDML(X*) uses only the sample 
information for Y,, Y, and Z, and the sample selection probabilities. The estimator 
WDML (X*,t,) uses in addition the sampling values of the design variable. The estimator 
CPL uses only the sample values of Y, and Y, and the sample selection probabilities. 
We maximized the likelihood derived from the weighted distributions using a quasi-Newton 
method in the subroutine library IMSL. The method employed requires partial derivatives 
of the likelihood with respect to each of the parameters as user supplied input. An issue 
that arose in the maximization is worth mentioning. It is easier to parameterize the 
likelihood in terms of ¥~' where ¥ is the covariance matrix among Y,, Y> and Z,. 
Furthermore, to insure that the six parameters that define Y ~! are unconstrained, we use 
the elements of the upper triangular matrix B so that B’B = ¥~!. Any choice of the 
values for B leads to a matrix Y~! that is positive semi-definite. 


4.4 Results 


We present the results obtained when estimating », = E(Y,), 07 = Var(Y,) and B,, - 


the slope coefficient in the regression of Y> on Yj, as representative of the results obtained 
when estimating the other parameters. Tables 1-3 contain the RMSE of the various estimators 
as obtained for the two sampling schemes and the two choices of the design variable. RMSE’s 
dominated by large biases are indicated by an asterisk. 


The main results emerging from the tables (and from estimating the other model parameters) 


can be summarized as follows: 


(1) 


(2) 


(3) 


The estimator ML (Z,,Z,) outperforms all of the other estimators when the ignorability 
conditions are met, but it is severely biased when the sampling design is informative. The 
estimator WML (Z,,Z,) is essentially unbiased in all of the cases, but the use of the 
sampling weights increases the variance. Still, this estimator dominates in general the 
estimator CPL especially under the PPS design because of the use of the population values 
of (Z;,Z>). 

The estimator ML (Z,) is severely biased in almost all of the cases. Notice in particular 
the large biases in the case where ¢; = 0.5(z,; + Z;), illustrating the sensitivity of the 
MLE’s to the exact specification of the design variables. Like with WML(Z,,Z,), the 
estimator WML (Z,) is unbiased, and for the PPS design it outperforms the estimator CPL. 
The estimator CPL is unbiased in all of the cases. An interesting result emerging from the 
tables is that relative to the other estimators considered, it performs better in estimating 
the mean than in estimating variances and covariances. An intuitive explanation for this 
outcome is that in the latter case the sampling weights are used twice, thereby increasing 
the variance. 


236 Krieger and Pfeffermann: ML Estimation from Complex Sample Surveys 


Table 1 


RMSE of Estimators of », for Different Sampling Schemes and Design Variables 
(True Mean: p; = 40) 


D1 - PPS Sampling D2 - Stratified Sampling 

Estimators Se ee 

Ue = O52; iis = 0.25x; i = O.5z; Gj = 0.25x; 
ML (Z,,Z>) 0.43 1.86* 0.47 3.43* 
WML (Z,,Z>) 0.43 0.57 0.50 0.52 
ML (Z;) 267" 4.38* 6:39* S332* 
WML (Z;) 0.58 0.90 0.62 0.58 
WDML (X*,Z)) 0.56 0.63 eh ks 0.59 
WDML (X*) 0.80 0.90 oo 0.49 
i al tris i RS) 0.56 0.47 
WDML (X*,t,) = = 0.74 0.43 
WDML(CX™.T.,2£5) = = 0.74 0.57 
* RMSE dominated by bias. 

Table 2 


RMSE of Estimators of of for Different Sampling Schemes and Design Variables 
(True Variance: of = 300) 


D1 - PPS Sampling D2 - Stratified Sampling 
Estimators —_—_——— oS 
bs = 0.52; = 0.25x; Lj = 0.52; G = 0.25x; 
ML (Z,,Z>) 12333 135" 16.00 29.00* 
WML (Z,,Z>) 14.00 18.72 20.87 19.83 
ML (Z;) 74.92% 33.66* 35-16% 53.66* 
WML (Z)) 18.61 26.61 24.22 20.35 
WDML (X*,Z}) 14.36 17.41 26.94* 15.49 
WDML (X*) 16.37 19.68 41.08* 15.34 
CEL, 21.11 29.06 24.19 20.18 
WDML (X*,t,) - _ 26.18* 15.46 
WDML (X*,t,,Z)) - — 25.70* By 


* RMSE dominated by bias. 


Survey Methodology, December 1992 237 


Table 3 


RMSE of Estimators of B>, for Different Sampling Schemes and Design Variables 
(True Coefficient: By; = 1.33) 


D1 - PPS Sampling D2 - Stratified Sampling 
Estimators oa ee eae Lesa 
t; = O%52; isp => 0.25x; tj = 0.52; G = 0.25x; 
ML (Z,,Z>) 0.043 0.069* 0.048 0.120* 
WML (Z,,Z>) 0.054 0.060 0.068 0.066 
ML (Z,) 0.045 0.078* 0.056 Onis4= 
WML (Z;) 0.055 0.062 0.069 0.065 
WDML (X*,Z)) 0.043 0.047 0.049 0.045 
WDML (X*) 0.044 0.049 0.050 0.046 
CPL 0.055 0.063 0.069 0.065 
WDML (X*,t,) - ~ 0.048 0.045 
WDML (X* 1.25) - - 0.048 0.045 


* RMSE dominated by bias. 


(4) For the PPS design, the estimators WDML (X*) and WDML (X*,Z,) perform very well 
with WDML (X*) clearly dominating CPL and WDML (X*,Z,) dominating WML (Z)). 
Interestingly, the estimator WDML (X*) performs in general better than the estimator 
WML (Z,) despite the use of less information. The fact that WDML (.X*) outperforms 
CPL could be explained by the fact that it is more ‘‘model dependent’’, although as 
discussed in section (2.4), one way of viewing CPL is as the estimator maximizing the design 
unbiased estimator of the likelihood equations holding in the population. 


(5) Next consider the stratified design. In the case were ¢; = 0.25x;, the picture is very similar 
to the PPS case with WDML (X*) dominating again both CPL and WML (Z,). Actually, 
there is little to choose in this case among the four estimators derived from the weighted 
distribution likelihood despite the use of different sample and population data by each 
estimator. When ¢; = 0.5z;, all of the four estimators are inferior to WML (Z,) and CPL 
although interestingly enough, not with respect to the estimation of the regression coeffi- 
cient where they all perform very similar to the optimal ML (Z,,Z,). The particularly poor 
performance of WDML(X*) (and to a much lesser extent of WDML(X*,Z,)) in 
estimating the mean and variance is mainly the result of incorrect specification of the strata 
boundaries and hence incorrect specification of the denominator of the likelihood (3.10). 
This problem can possibly be resolved by either including the strata boundaries and the 
a* - coefficients relating the values ¢; to the observed data (equation 3.7) as part of the 
unknown parameters in the likelihood (3.10), or by replacing the linear discriminant func- 
tion by some other (nonlinear) function such as logistic regression. The latter approach 
has the advantage of reducing the number of parameters over which the likelihood has to 
be maximized, which can be crucial when the number of strata is large. 
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We considered so far the unconditional bias and RMSE of the estimators. As mentioned 
in section 4.1, we studied also conditional properties by computing the bias and RMSE’s over 
samples with similar sample means of the design variable. The conclusions reached from that 
study are very similar to the conclusions stated before. Thus, estimators which are approx- 
imately unbiased unconditionally are also approximately conditionally unbiased and vice versa. 

This result is somewhat surprising because it has often been illustrated in the literature that 
the CPL estimator, for example, has poor conditional properties. Possible explanations in our 
case are that the sample size considered is large or that the division of the sample into the ten 
groups was not sharp enough. Because of space limitations we omit the results illustrating con- 
ditional properties of the estimators. 


5. CONCLUDING REMARKS 


The results of the simulation study show that estimators obtained by maximizing the 
likelihood derived from weighted distributions are a favorable alternative to the pseudo 
likelihood estimators obtained by maximizing design consistent estimators of the census 
likelihood equations. The estimators perform particularly well in our study when using an 
informative sampling scheme for which the ‘‘classical’? MLE can become severely biased. The 
use of these estimators requires, however, the modeling of the relationship between the sample 
selection probabilities and the observed sample data. As illustrated in the simulation study, 
failure to model or estimate the relationship correctly may introduce large biases. 

The key question to the practical use of these estimators is therefore whether the model 
relating the sample selection probabilities to the observed response and design variables can 
be successfully identified from the sample data. It would seem that this question can only be 
answered by considering actual surveys that use common sampling designs. Other important 
questions related to the use of these estimators are the availability of reliable variance estimators 
so that accurate confidence intervals can be set and the protection against misspecification of 
the parent distribution of the response variables in the population. These two questions are 
common to other MLE procedures. We hope that the initial results of our study will encourage 
further research on these and other related questions. 
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Methods for Estimating the Precision of Survey Estimates 
when Imputation Has Been Used 


CARL-ERIK SARNDAL! 


ABSTRACT 


In almost all large surveys, some form of imputation is used. This paper develops a method for variance 
estimation when single (as opposed to multiple) imputation is used to create a completed data set. 
Imputation will never reproduce the true values (except in truly exceptional cases). The total error of 
the survey estimate is viewed in this paper as the sum of sampling error and imputation error. Conse- 
quently, an overall variance is derived as the sum of a sampling variance and an imputation variance. 
The principal theme is the estimation of these two components, using the data after imputation, that 
is, the actually observed values and the imputed values. The approach is model assisted in the sense that 
the model implied by the imputation method and the randomization distribution used for sample selection 
will together determine the appearance of the variance estimators. The theoretical findings are confirmed 
by a Monte Carlo simulation. 


KEY WORDS: Single value imputation; Variance estimation; Imputation model; Model assisted 
inference. 


1. DIFFERENT TYPES OF IMPUTATION 


This paper reports work carried out in connection with the development of Statistics 
Canada’s Generalized Estimation System (GES). Variance estimates are to be routinely 
calculated in the different estimation modules that define the GES. There was a need to develop 
suitable methods for variance estimation when the data set contains imputed values, which 
is the case in practically all surveys. 

Two principal approaches to estimation with missing data are weighting and imputation. 
In the recent literature, the weights used to compensate for nonresponse are usually viewed 
as the inverse of the response probabilities associated with an assumed response mechanism. 
Since the response probabilities are ordinarily unknown, they need to be estimated from the 
available data. Imputation, on the other hand, has the advantage that it yields a complete data 
matrix. Such a matrix simplifies data handling, but it does not imply that ‘‘standard estimation 
methods’’ can be used directly. The imputed values are sample-based, thus they have their own 
statistical properties, such as a mean and a variance. 

In our age, imputation is an extensively used tool. It is interesting to note what Pritzker, 
Ogus and Hansen (1965) say about imputation policy at the US Bureau of the Census: ‘‘Basically 
our philosophy in connection with the problem of ... imputation is that we should get 
information by direct measurement on a very high proportion of the aggregates to be tabulated, 
with sufficient control on quality that almost any reasonable rule for ... imputation will yield 
substantially the same results ... With respect to imputation in censuses and sample surveys 
we have adopted a standard that says we have a low level of imputation, of the order of 1 or 
2 percent, as a goal.”’ 


' Carl-Erik Sarndal, Département de mathématiques et de statistique, Université de Montréal, C.P. 6128, succursale A, 
Montréal (Québec) H3C 3J7. 
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Ideally, we should still strive for the goal of only one to two percent imputation. But in our 
time most surveys carried out by large survey organizations show a rate of imputation that 
is much higher. Clearly, if 30% of the values are imputed, the effects of imputation can not 
be ignored. Imputation can create systematic error (bias) in the point estimate; this is perhaps 
the most serious concern. But even if an imputation method can be found such that there is 
no appreciable systematic error, one must not ignore the often considerable effect that 
imputation has on the precision (the variance) of the point estimate. There is a need for simple 
yet valid variance estimation methods for survey data containing imputations, so that the 
coefficients of variation of the survey estimates can be properly reported. 

A variety of imputation methods have been proposed. These can be classified in different 
ways. One way to classify is by the number of imputations carried out. In single imputation 
methods, a single value is imputed for a missing value. A complete data matrix is obtained, 
in which the imputed values are flagged. Estimates are calculated with the aid of the completed 
set. In multiple imputation, two or more values are imputed for each missing value. Several 
completed data sets are thus obtained. Estimates are calculated with the aid of the completed 
data sets. 

Imputation methods also differ with respect to the modeling underlying the imputation. 
Some imputation methods use an explicit model, as when the imputed value is obtained by 
a regression fit, a ratio or mean imputation. In other methods, the model is only implicit, as 
in hot deck imputation and nearest neighbour donor imputation. The distinctions just made 
are important for this paper. 

Statistics Canada currently uses imputation methods such as nearest neighbour donor, 
current ratio, current mean, previous value, previous mean, auxiliary trend. All of these are 
single imputation methods. The imputed values originate in the Generalized Edit and 
Imputation System (GEIS), from where they enter into the Generalized Estimation System 
(GES), where the point estimates and the variance estimates are calculated in a number of 
different estimation modules. This paper deals in particular with current ratio imputation, which 
represents a case of explicit modeling. 


2. SOME THOUGHTS ON MULTIPLE IMPUTATION 


Multiple imputation was suggested by D.B. Rubin around 1977. His ideas are explained in 
a number of papers, of which Herzog and Rubin (1983) and Rubin (1986) are expository, and 
in a book, Rubin (1987). Multiple imputation has advantages as well as disadvantages; the same 
is true for single imputation. 

Rubin (1986) sees as a disadvantage of single imputation that ‘‘... the one imputed value 
cannot in itself represent uncertainty about which value to impute: If one value were really 
adequate, then that value was never missing. Hence, analyses that treat imputed values just 
like observed values generally systematically underestimate uncertainty, even assuming the 
precise reason for nonresponse are known.”’ 

Multiple imputation is attractive because it communicates the idea that imputation has variability. 
It is precisely this variability - the variability within and between the several completed data 
sets — that is exploited in the variance estimation methods proposed under multiple imputa- 
tion. These methods make powerful use of basic statistical concepts. (On the other hand, one 
can argue that sample selection also has variability, but most surveys cannot afford more than 
a single sample, and estimation must be carried out with this unique sample.) 

Simple examples show that treating imputed values just like observed values can lead to severe 
underestimation of the true uncertainty; survey samplers have long been aware of this. And 
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it is a fact that users sometimes treat imputed values just like observed values, with wrong 
statement of precision as a result. With modern computers, it is easy to impute by some rule 
or another, but not so easy to obtain valid variance estimates. 

The citation above seems to conclude that because a single imputed value does not display 
variation, we cannot obtain reasonable variance estimates; we are necessarily led to underestima- 
tion. I do not share this opinion. The methods that I discuss show that valid variance estimation 
is indeed possible with single imputation. 

A method for variance estimation in the presence of imputed values should have the following 
properties: (a) a sound theoretical backing; (b) robustness to the assumptions underlying the 
imputation; (c) it must be practical, easy to carry out, and readily accepted by users. 

While multiple imputation has the ingredients (a) and (b), it is clear that, in some applications 
at least, it does not have the property (c). In the development of the GES we must depend on 
procedures that are easy to administer and easy to accept by the user. The user of a data set 
(someone who is not primarily a statistician) can easily understand that the statistician imputes 
once, with the objective to fill in the best possible value for one that is missing. While it is true 
that for some purposes, such as secondary analyses, it might be interesting to have several 
completed data matrices, the costs of storage of multiple data sets will often rule out this option. 

Multiple imputation may well be useful in other contexts and for other reasons than those 
that are essential to the development of the GES. The multiple imputation method has indicated 
one way of handling the problem of understatement of the variance, at least for some situations. 
The method has recently come under criticism by Fay (1991) and is not the only answer. Let 
us see what can be done with single imputation methods. The method described below is based 
on Sarndal (1990). 


3. IMPUTATION VARIANCE AND SAMPLING VARIANCE 


An imputation rule corresponds to an (explicit or implicit) model for the relationship among 
variables of interest to the survey. That is, when the analyst has fixed an imputation rule, he 
or she has in fact chosen a model. The principle for the developments that follow is that if this 
rule is considered good enough for the point estimates (no systematic error), the rule is also 
good enough for the corresponding estimates of variance. In other words, the model maker 
should take responsibility for control of the bias as well as for the appropriateness of the 
variance estimate. 

Let U = {1, ..., k, ..., N} bea finite population; let y denote one of the study variables 
in the survey. The objective is to estimate the population total of y, tf = Vyy,. (If Cis any set 
of population units, where C € U, Yc is used as shorthand for Y;<c, for example, t = Y yy, 
means ) <u.) A probability samples s is selected with a given sampling design. The inclusion 
probabilities are known, and ordinary design-based variance estimates would be obtained if 
all units k€s are observed. However, there are missing data. Let r be the subset s for which 
the values y, are actually observed. For the complement, s — r, imputations are calculated. 
The data after imputation consist of the values denoted y,,, k€s, such that 


Vk if kK-er 


Sek a 
JYimp,k if kKées-r s 


where y, is an actually observed value, and y;m,,, denotes the imputed value for the unit k. 
The case r = s implies no imputation; all data are actual observations. 
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Let us write the estimator of ¢ that would be used in the case of 100% response (that is, 7 = s) 
as f = YresM dy = ¥sWed%, Where w; is the weight given to the observation y,. For example, 
in simple random sampling without replacement (SRSWOR) of 7 units from N, we= N/n 
for all k€s when the expanded sample mean is used to estimate ¢, and we= (Z)/Z,)(N/n) = 
(¥ vz)/(% 2) for all k€s when the ratio estimator is used with z as an auxiliary variable. 

When the data contain imputations, the estimator of fis 7, = ¥,W,,. That is, we assume 
that the weights w, are identical to those used when all data are actual observations. This 
principle is used in the estimation modules of the GES. It embodies an assumption that 
imputation by the chosen rule causes little or no systematic error in the estimates. 

The variance of an estimated total is increased by imputation, because imputation does not 
(except in truly exceptional circumstances) reproduce the true value y,. Concrete evidence of 
this is the fact that if the imputation rule is applied to the actually observed sample units, there 
will always be error. If the rule is not without error for the responding units, it is not without 
error for the nonresponding units either. In Section 4 we express the variance of f, asa sum 
of two components, a sampling variance, and a variance due to imputation, 


The imputation variance \,,,, is zero if all data are actually observed values, or if the impu- 
tation procedure is capable of exactly reproducing the true value y, for every unit requiring 
imputation. (Neither case is likely in practice.) The procedure given in Section 4 uses the data 
after imputation, y,;,, K€s, to obtain estimates of each of the two components, leading to 


View = Van + ae 


The component V,,,, is calculated in two steps: 


(1) Compute the standard design-based variance estimate using the data after imputation. (For 
example, if SRSWOR is used, and r = s, the standard unbiased variance estimate of 
Ny, is N?(1/n — 1/N)¥,(%% — ¥,)?/(n— 1). This formula, calculated on the data 
after imputation, yields N*(1/n — 1/N) ¥,(%e~ — ¥es)?/(n — 1), where J,, is the 
mean of the 7 values y, x.) 


(2) Add a term to correct for the fact that many imputation rules give data with ‘“‘less than 
natural’’ variability, which would lead to understatement of the sampling variance unless 
corrective action is taken. Finally, the component Vinp is readily computed from the data 
after imputation. The user will easily accept the argument that the variance obtained by 
the standard formula is not sufficient in itself; something must be added because the imputa- 
tion rule is less than perfect. 


The method has the good property that if no imputation is required, that is, r = s, then 
Vimp = O and V.., equals the ‘standard variance estimator’’ that one would have used with 
100% actually observed values. 


4. THEORETICAL DEVELOPMENTS 
The total error of f, is decomposed as 


A 


(i, —t = (¢—t) + (f, — 4 = sampling error + imputation error. 
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The imputation error is the difference between the unknown estimate that would have been 
calculated if the data had consisted entirely of actual observations and the estimate that can 
be calculated on the data after imputation. The imputation error is 


where 


Cx = Ve — Yimpok 


is an imputation residual which can not be observed for a unit k € s—r. The magnitude of e, 
depends on how well the imputation model fits. The residuals are small if the imputation method 
gives nearly perfect substitute values. To pursue the argument, different directions may be 
taken. Here, we use a model assisted approach in which three different probability distributions 
are considered. The corresponding expectation symbols are written as F;, E,, and E,. Here, 
£ indicates ‘‘with respect to the imputation model’’; s indicates ‘‘with respect to the sampling 
design’’, and r indicates ‘‘with respect to the response mechanism, given s’’. The model is 
implied by the imputation rule, so it is known; the sampling design is the given probability 
sampling distribution, so it is also known; the response mechanism is an ordinarily unknown 
distribution governing the response, given the sample s. 

The estimator /, is overall unbiased in the sense that E-E,E,(¢, — t) = Oif two conditions 
hold: 


(a) the order of the expectation operators can be changed so that E; E,F,(-) can be evaluated 
as Ee Aee(*) |S, 7) } and 

(b) the imputation residual e, = yx — Yimp,« has zero model expectation for every ker, that 
is, Ez(e,) = 0, which implies that E;(f, — f) = 0. 


Condition (a) is satisfied if the response mechanism is one that may depend on s and on 
auxiliary data, but not on the y-values, y,, k€s. That is, the probability g(r) of realizing the 
response set ris of the form g(r) = q(r| 5, {x,:k€s}), where {x,:k€s} denote the auxiliary 
data. The response mechanism can then be said to be ignorable. 

We now examine the overall variance given by 


Voswmitie hs By Giteigs BY tis 
which may also be called the anticipated variance under the imputation model £. We obtain 


Vig NEE (t,o = BEE PE, = 0) 


Bape & 9 ¥ ie)? 


EV EE Vee (4.1) 
where V, = E,{ (¢ — t)}* is the design-based variance of ¢, supposing ( is design unbiased for 
the total ¢. (For an estimator with some slight design bias, V, is the design-based mean square 


error of f.) Note that (f — t) depends on s only, and not on r. Moreover, 


Vie = Ex((t, — 1)*| 8,7) 
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is the model variance of the imputation error, conditionally on s and r. The subscript c stands 
for ‘‘conditional’’. The derivation of (4.1) assumes that condition (a) holds so that the expec- 
tation £; can be moved inside £;E,, and that the mixed term 


2E ES ol ile 2) )s¥) (4.2) 


vanishes or is sufficiently close to zero that we can ignore it. This would be the case if the 
expected imputation error is zero or negligible under the response mechanism, conditionally 
on the realized probability sample s. Even if (4.2) is not exactly zero for the mechanism that 
determines the response, we can in many cases approximate (4.2) by zero and still use the method 
below to obtain a variance estimate that is much better than pretending naively that imputed 
data are as good as actually observed data. For ratio imputation and SRSWOR, which is an 
application considered in Section 5, the term (4.2) is exactly zero. 


If we denote Vian = E;:V, and Vin» = E,E,V;, in (4.1), then 
Veot = Veam + View 
or 


overall variance = sampling variance + imputation variance. 


The objective is to estimate the overall variance, so that a valid confidence interval for the 
unknown f can be calculated. Our approach is to obtain separate estimates, V,,,, and Vienps of 
the two components Vian = E:V, and Ving = EE, Veo. The data available for this estimation 
are y,,, K€s. The argument for obtaining V,,,, and Vinp | is as follows: 


(i) Estimation of the sampling variance component. Let V, be the standard (design-unbiased 
or nearly design-unbiased) estimator of the design variance V;. Denote by V,, the quan- 
tity obtained by calculating V,, from the data after imputation, y,,, kes. For many imputa- 
tion rules, V,,, huiadvestaates Viam- The underestimation is compensated in the following 
way. Evaluate the conditional expectation 


BUSES VS Ary Va 


Then for given s and r, find a model unbiased estimator, denoted Vyir, of Vir. This will 
usually require the estimation of certain parameters of the model &. Consequently, 


BV ait sry Ep, Vere 
Then 
Vian = Vp ae 
is overall unbiased for the component V,,,, = E;:V,, as the following derivation shows: 
E,EE}(Viem) = EsE(Eg(Vep) + Ex(Vait)} 


EE AEs ( V,) J = EE, ( V,,) 
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(ii) Estimation of the imputation variance component. Simply find an estimator, Vics that is 
model unbiased for V;,. That is, E;(Vz.) = Ve. Again, this may require the estimation 
of unknown parameters of the model €. Then V;, is overall unbiased for the imputation 
variance component Vimp, since 


EEE: ( Vic) = EsE Vic = Vimp- 
Finally, an overall unbiased estimator of V,,; is given by 
Viot = ers =e Vis; 


where Kiam = Vep + Vai and Vinp = Vic. Note that the role of Vgir is to correct for the fact 
that the data after imputation may display ‘‘less than natural’’ variation. This often happens 
when Yimp,« equals the predicted value from a fitted regression, that is, ‘‘the value on the line’. 
The variation around the line is not reflected in the predicted value. 

To be overall unbiased, the estimator V,,, constructed above requires that condition 
(a) holds, that (4.2) is zero, and that the imputation model is correct, so that Var and Vee 
are model unbiased for Vaip and V;., respectively. Mild departures from the assumed 
imputation model may not have serious consequences, but if the imputation model is grossly 
misspecified it is clear that V,,, may be considerably biased because of the model bias of Vais 
and Vic. Monte Carlo simulations reported in Lee, Rancourt and Sarndal (1992) show that 
the variance estimator V,,, is fairly robust to imputation model breakdown. To add the terms 
Voie and Vee is in any case a vast improvement on simply using the naive uncorrected variance 
estimator V,,,. 

Note that if the imputation model holds, an unbiased variance estimate is obtained with 
the method even if the response probabilities differ among units, as long as they depend 
on the x;,-values only. That is, we can allow a systematic response pattern such that large 
x,-value units are less likely to respond than small x;,-value units. If the response probabilities 
depend explicitly the y,-values, then the situation is different; the response mechanism is 
nonignorable and condition (a) does not hold. There will now be bias in V,,, due to 
nonignorability; the simulations in Lee, Rancourt and Sarndal (1992) throw some light on the 
magnitude of this bias. 


Example. The sample s is drawn with SRSWOR; v units from N. Let m denote the size of the 
response set 7. Suppose the respondent mean is imputed for units requiring imputation. The 
corresponding imputation model & states that y, = 6B + e,, where the e, are uncorrelated 
errors terms with E;(e,) = 0, Vi(ex) = o*. That is,.y,,.= J, if kerand y,; = 6 = j, if 
kes — r, and we obtain the estimator f, = (N/n) ¥<¥ex = NJ,. Here the standard design- 
based variance estimator for 100% response is V, = N*(1/n — 1/N) ¥s(ve — 3,)°/ (n= 1); 
when this formula is computed on data after imputation we get Vey = N*(1/n — 1/N) 
{(m — 1)/(n — 1)}S2,, where S2, = Y,(¥% — ¥,)?/(m — 1). Other derivations give Var = 
N?(1/n — 1/N){(n — m)/(n — 1)}S3, and Ving = N?(1/m — 1/n)S},. Thus, Yam = 
Vey + Vip = N2(1/n — 1/N)S3,, and Via. = N2(1/m — 1/N)S3,, which is easy to accept 
as a ‘‘good”’ variance estimator for this simple imputation rule. The following table shows 
the contribution of each of the three terms to the total variance estimator V,,,, for different 
rates of imputation, assuming that N is large compared to m and n, and (m — 1)/m = 
(n — 1)/n = 1. 
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Imputation rate in % % contribution toV%o; 
100. (1 ma/n) Pos Vaig Mae 
10 81 3 10 
20 64 16 20 
30 49 21 30 


The table illustrates the dangers of acting as if imputations are real data: with 30% imputed 
values, the standard formula variance estimator toe in this example covers less than half of 
the correctly estimated total variance. Imputation by the respondent mean is useful as an 
example; the results are particularly simple. But usually in practice, respondent mean imputation 
is neither justified nor efficient. The underlying model is not sophisticated enough to avoid 
systematic error in the point estimates, and the residuals e, = y, — y,can vary considerably. 


5. APPLICATION TO IMPUTATION BY THE CURRENT 
RATIO METHOD 


The method assumes that a positive auxiliary value x, is known for every unit kes. If 
k € s—r, we impute Vimp.« = Bx, with B = (Y,yx)/(L,xX,). The data after imputation are 


yb if Bia 
Bx, if k€s—r. 


The model behind current ratio imputation is 
Ve = Px + €, (at) 
where the e, are uncorrelated model errors such that 
Ee Cer) On ra epee ree ey) 


Suppose that the sample s is selected by SRSWOR. Let the respective sizes of s, r, ands — r 
be n, m, and n — m. If no imputation was needed, the estimator of ¢ = ¥ yy, would be 
t = Ny,.Using the data after imputation, we get 


i, = (N/n) YY Yeu = N&I;/%, (5.3) 


RY 


(Overbar and subscript s, r, ors — rindicates ‘‘straight mean’’, for example, y, = Y ,y,/m, 
X,;_, =¥5_,-xX;,/(n — m), etc.) Using the results of the preceding section, we have V,,., = 
Viam + Vimp With Viam = Ez{N?(1/n — 1/N)S5y} and Vinp = EsE,{N?(1/m — 1/n)C,o7}, 
where Sey =Yu(% — Fy)7/(N — 1) and C, = %,X,_,/%,, a known constant. The mixed 
term (4.2) is exactly zero in this case. Our method of variance estimation gives Vi, = 


Vsam + Vimp» where 


V cy SN? (ft E/N) (8245 + Co07), (5.4) 
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Vinp = N?(1/m — 1/n)C,é?, (5.5) 


where Sete = Y.(¥ex~ — Jes)?/(n — 1) is the variance calculated on data after imputation, 
and we have chosen to estimate o” by the model unbiased formula 


De I L(Y — Bxx)” 
%{1 — (1/m) (cv,,)7} m— 1 


where cv,, = S,,/X, is the coefficient of variation of x in the response set r. The constant Co 
is obtained as 
a»! 2 2 
Co 7 pubes a Syes) ’ 


where 
1 
Sis = reas ye (Ye — Ds)? 
S 


is the (unknown) sample variance based on data with 100% actual observations. After 


evaluation, 
1 eves tke ex 
Cy = hee ae a 
nea ie « n Le Xk 


If m is not too small, the approximations 67 = (¥,e%)/(¥,-xX,) with e, = yy, — Bx, and 
Cy = (1 — m/n)x,_, are sufficiently good for most applications. 


We can write the imputation variance component as 


A 


Vinp = N*(1/m — 1/n)AX,6’, 


where A = X,_,/X,. The constant A reflects the selection effect due to nonresponse. If large 
units are less inclined to respond than small units, then A may be considerably greater than 
unity, and, for a given a sample s and a given number m of respondents, the component Kod 
tends to be large, relative to a case where, say, all units are equally likely to respond. This 


tendency makes good sense intuitively. 


Two special cases are noted: (1) If all x, = 1, the estimated total variance becomes simply 
Vrot = Vian zc: an = N?(1/m = 1/N)S},5 


where spre is the variance of the m actual observations y,. This agrees with the variance 
obtained under a two-phase sampling design with SRSWOR in each phase. (2) If no imputa- 
tion is required, that is, ifs = r, then Vion = eatiG 


Vise =" Veam i= IN? (Aint = ALN) Sy 9: 
That is, our method yields the well known variance estimator for SRSWOR. 


A Monte Carlo study with 100,000 repeated response sets r was carried out to confirm the 
above results for current ratio imputation. A finite population of size N = 100 was generated 
according to the model consisting of (5.1) and (5.2). The typical response set r was obtained 
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as follows: Draw a SRSWOR sample s of size n = 30; given s, generate r by a response 
mechanism in the form of independent Bernoulli trials, one for each k€s, with probability 6, 
for the outcome ‘‘response’’. Three different response mechanisms were used: Mechanism 1: 
6, increases with y, in such a way that 0, = 1 — exp(— a,y,); Mechanism 2: 0; increases as 
y, decreases in such a way that 6, = exp(— a)y,); Mechanism 3: 6; is constant at 0.7, that 
is, a uniform response mechanism. The constants a, and a, in the first two response 
mechanisms (which can be described as non-ignorable) were fixed to obtain an average response 
probability of 0.7. The sizes of the realized response sets r thus varied around a mean of 21 
for all three mechanisms. For each 7, the point estimate /, given by (5.3) was calculated as 
well as three different variance estimators, V = V(f, ). These were: (1) the model assisted 
variance estimator V;4. = Veam + Vee equal to the total of (5.4) and (5.5); (2) the two-phase 
sampling variance estimator N7(1/n — LIN) SS; + N?(1/m-— 1/n) Vets (m= Ty, an 
estimator which follows from standard two-phase sampling theory with an assumption of 
SRSWOR subsampling of m respondents from the 7 units in the initial sample (Rao 1990); 
and (3) the standard unadjusted variance estimator N?(1/n — 1/N)S;,, obtained by acting 
as if imputations are as good as actual data. The results are shown in the following table. 


Relative bias of Vin % 
Estimator V a 
Mechanism | Mechanism 2 Mechanism 3 


Model assisted — 0.20 — 4.64 — 3.99 
Two-phase 9,95 — 12.49 -1.11 


Standard unadjusted — 25.73 — 37.90 — 33.21 


The relative bias of an estimator V was calculated as {mean(V) — var(f,)}/var(f, ), 
where mean (V) is the mean of the 100,000 values of V, and var(f, ) is the variance of the 
100,000 values of 7,. The simulation shows that the model assisted variance estimator 
Vacts Sant aoe a en is nearly unbiased for all three response mechanisms. In a way, this is not 
surprising because the population was generated to agree with the ratio imputation model. 
Mechanisms | and 2 are of the nonignorable kind and do not verify condition (a) of Section 
4 required for unbiasedness of V,,;. Interestingly, though, in this example the bias of V,,, 
remains small despite this. The two-phase estimator works well for the uniform response 
mechanism 3, the case for which it was conceived; otherwise it is biased. Finally, to act as if 
imputed data are as good as actual data leads, as expected, to a dramatic understatement of 
the true variance for all three mechanisms. A more extensive Monte Carlo study of ratio 
estimation is reported in Lee, Rancourt and Sarndal (1992). This paper gives an idea of the 
effect of imputation model misspecification, which is also discussed in Rao (1992). 


6. IMPUTED VALUES THAT HAVE AN ADDED RESIDUAL 


We can distinguish two types of imputed values: (1) the imputed value y;,, 4 consists of a 
predicted value only, Yreq,~, aS When the value on a fitted regression line or surface is used. 
For example in the current ratio imputation method as used above, Vimp,« = Mpred,k = Bx, 
with B = (¥,y~)/(¥-xX~); (2) the imputed value yim,, Consists of a predicted value and a 
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residual, so that Yimp,~ = Yprea,k + €¢- The residual term, whose purpose is to make 
imputed values more like actual observations, may be obtained by sampling the residuals 
Cx = Ve — Vprea,x Calculated for the responding units ker. A scheme for this is given below. 
This type of imputation is sometimes recommended in the literature as a means of preserving 
the distributions of the imputed data; see, for example, the discussion in Little (1988). The 
imputation process then requires more effort to complete, and for the purposes of the GES 
(whose principal aim is valid estimation of the precision of survey estimates), it is not clear 
that the advantages gained are worth the extra effort. 

Let us, however, indicate one scheme for imputation by ‘‘predicted value plus residual’’ 
in the case where the current ratio imputation model is taken as the point of departure: For 
ker, calculate e, = y, — Bx, with B = (¥,yx)/(¥,-xXx), then & = ex/Vx,. This gives a 
supply of m ‘‘standardized residuals’’ é,. Then for a unit k € s—r, calculate epeeavtr er, 
where é; is drawn by SRSWR from the supply, and x, belongs to the unit requiring imputa- 
tion. Then large x-value units tend to obtain larger residuals e?, which is consistent with the 
model. Then set ef = e? — (),_,e¢)/(m — m), For k € s—r, impute ving x = Bx, + ex, 
k € s—r; for kér, we have actual observations, y,. Since the ef were made to sum to zero over 
s — r, the point estimator is given by f, = (N/n) ¥¥ex = NX.J,/X, as in Section 5, but its 
variance is different. It can be shown that E, E,E,E4(S}es = Sy) = 0, where Ey, denotes 
average with respect to the random selection of a standardized residual. That is, the difference 
between the variance calculated on data after imputation, Se >, and the unknown variance of 
a sample consisting entirely of actual observations, S : s) 1S approximately zero on the average. 
We can use Veam = N?(1/n — 1/N) Sey > aS an approximately overall unbiased estimator of 
the sampling variance component. There is no need now to add a correction Vai. However, 
an estimator of the imputation variance V,,, = N?(1/m — 1/n)C, o* must still be calculated 
and added to Vian. 


7. CONCLUDING REMARKS 


The continued work on the variance estimation techniques outlined in this paper has the 
following objectives: (1) extensions to imputation procedures based on models that are implicit 
only, in particular the nearest neighbour donor method; (2) extensions to the case where there 
is a mixture of several imputation procedures in the same survey. 

Deville and Sarndal (1992) present results for an extension in which the Horwitz-Thompson 
estimator, / = ¥.,y,/7,, serves as the prototype. The estimator using data after imputation 
is then 


8s = a Vel Wr te ( oe sla) B = ye Viel WK = ‘es Cr/ hr. 
r s—r 9 s—r 


where e€, = ), — X{B is the imputation residual for unit k obtained by multiple regression. 
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A Sample Allocation Method for Two-Phase 
Survey Designs 


J.B. ARMSTRONG and C.F.J. WU! 


ABSTRACT 


Motivated by a business survey design at Statistics Canada, we formulate the problem of sample allocation 
for a general two-phase survey design as a constrained nonlinear programming problem. By exploiting 
its mathematical structure, we propose a solution method that consists of iterations between two 
subproblems that are computationally much simpler. Using an approximate solution as a starting value, 
the proposed method works very well in an empirical study. 


KEY WORDS: Optimal allocation; Convex programming. 


1. INTRODUCTION 


The purpose of this paper is to propose a method of sample allocation for two-phase survey 
designs. Suppose it is necessary to stratify a population of size N into L strata according to 
an auxiliary variable, z, whose information is not known before sampling. Values of a second 
auxiliary (size) variable, x, that is correlated with the variable of interest, y, are known for 
all units in the population. At the first phase of sampling, the population is divided into G 
strata according to x. An initial sample is drawn from size stratum g(g = ba eG) usilic 
simple random sampling with sampling fraction v,, and the z-value for each sampled unit is 
observed. At the second phase, units in the sample from size stratum g with z-value in class 
h(h = 1,2, ..., L), are subsampled using sampling fraction v,,. The value of y is observed 
for units in the second-phase sample. 

In the case of no size stratification (G = 1) Cochran (1977) gives the allocation that 
minimizes the variance of the estimate Y = Y), Liesnn Wi/(v + vp) of the population total 
Y = Y,N, - Y;,, subject to a fixed survey cost, C, where Nj, and Y, are the population size 
and population mean, respectively, for stratum A and Y jesann Yi denotes the sum of y-values 
for units in the second phase sample, s2, with z-value in class h. If survey estimates are used for 
analytical purposes, the variance of the estimated total for z class hay = ean WU. Un) 
is also of interest. Sedransk (1965), Booth and Sedransk (1969), Rao (1973) and Smith (1989) 
have studied allocation problems involving the minimization of a function of variances of 
estimated class totals, subject to a cost constraint. 

The method described in this paper can be used to solve the allocation problem for general 
G when there is a constraint on the variance of the estimated total for each z class. The method 
was motivated by an application in a business survey conducted by Statistics Canada. The survey 
involves the sampling of tax records for businesses. 

Information about the population of taxfilers is made available to Statistics Canada by 
Revenue Canada. There is a requirement to produce estimates of financial variables for domains 
defined by a cross-classification of four-digit Standard Industrial Classification (SIC4) and 
province. Only two digits of SIC are coded by Revenue Canada with sufficient accuracy. In 
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order to standardize the precision of estimates for SIC4 domains within each province, a 
two-phase sample design was implemented. The first-phase sample of taxfilers is selected at 
Revenue Canada using strata defined using SIC2 and gross business income (size). Before the 
second phase sample is selected, an SIC4 code, considered more accurate than codes available 
from Revenue Canada, is assigned to each sampled unit by Statistics Canada. Strata defined 
using SIC4 and size are employed during selection of the second-phase sample. The same size 
boundaries are used for both phases of sampling. A detailed description of the sample design 
can be found in Choudhry, Lavallée and Hidiroglou (1989b). 

First-phase sample selection is done using Bernoulli sampling (also called Poisson sampling). 
Suppose that taxfiler / falls in first-phase stratum g within a particular province xX SIC2 cell. 
To determine whether taxfiler / is included in the first-phase sample, a pseudo-random number 
in the interval (0,1), say R;, is generated using the taxfiler’s unique identification number. The 
taxfiler is included in the first-phase sample if R;€(0,v,). Bernoulli sampling based on a 
different set of pseudo-random numbers is used to select the second-phase sample. Using 
Bernoulli sampling, selection and processing can begin before complete information about the 
taxfiler universe is available. This advantage of Bernoulli sampling is important, since taxfiler 
universe information is accumulated over a two-year period. Sample sizes obtained using 
Bernoulli sampling are random. Choudhry, Lavallée and Hidiroglou (1989b) derive the variance 
lial 6 ene SSD g Lieszngnhi/ (Vg * Vg) USiINg Simple random sampling as an approximation 
to Bernoulli sampling as discussed in Sunter (1986). Under the approximation, a simple random 
sample of fixed sizen; = vu, - N, is selected in size stratum g at the first phase. Let nz, denote 
the number of units with SIC4 fh in the first-phase sample for size stratum g. At the second 
phase, a simple random sample of size ng, = vz, + Ng, iS Selected for SIC4 / and size stratum 
g, with v,, considered fixed. The variance of Y, strat 1S given by 


1 


] 
Vi, = ye — | i Aen, + ye ==? | . Bon» 
Ue” ph P Ug 


where 


2 
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and Sm is the population variance in the second-phase SIC4 x size stratum gh. 


The plan of the paper is as follows. In Section 2, the optimal allocation problem is formulated 
in the context of the two-phase tax sample. An iterative solution procedure, called the exact 
method, is proposed. Section 3 includes a description of an approximation to the optimal 
allocation that can be used to obtain starting values for the exact method. The results of an 
empirical study involving comparison of various starting values for the exact method are 
reported in Section 4. Section 5 concludes the paper. 


2. EXACT METHOD 


In this section the optimal allocation problem is described and an iterative solution method, 
called the exact method, is proposed. To formulate the problem in the context of two-phase 
tax sampling, it is sufficient to consider one SIC2 cell in a particular province containing N 
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units. The cost of selecting a unit in the first-phase sample is K,, regardless of the stratum in 
which the unit falls, while the cost of selecting a unit in the second-phase sample is Kp, 
regardless of stratum. Under Bernoulli sampling, the cost function is 


ES) Pee Se br 
& ip h 


Since sample sizes ng and nz, are random, we use the expected cost 


F= Ky - Yu, - NotKy ~ )) Y) us> ten > Non (1) 
5 ap g h 


Rao (1973) and Smith (1989) also solve allocation problems for two-phase sample designs using 
expected values of random cost functions. In the tax sampling context, the total cost fora 
province is the sum of the costs for all SIC2 cells within the province. The estimated coefficient 
of variation of the cost of two-phase tax sampling for the province of Quebec, calculated using 
1988 data, was about 1.85%. Coefficients of variation for overall (national) costs were smaller. 

It is necessary to minimize (1) with respect to vu, g = 1,2...,G, and uz, g = 1, 
ee Gy an 0 ET UNG tile CON stl aliits 
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where C,, denotes the target coefficient of variation for SIC4 domain A. 


Attempts at direct solution of this problem using the IMSL (1987) implementation of the 
successive quadratic programming algorithm of Schittkowski (1985) produced mixed results. 
The algorithm worked well for problems with small numbers of variables and constraints. 
However, satisfactory solutions for problems including more than approximately 35 variables 
or more than approximately 50 constraints could not be obtained. 

Some costs obtained using direct application of Schittkowski’s algorithm in the tax sampling 
context are given in Table 1. The algorithm was applied to the allocation problems for some 
SIC2 cells in the province of Quebec involving large numbers of variables and/or constraints 
using data for tax year 1988. All first-phase and second-phase sampling fractions were started 
at one when the direct approach was used. The lowest cost obtained using the method that 
we call the exact method, which will be described later in this section, is also given. The 
information in the table indicates that direct use of the IMSL implementation of Schittkowski’s 
algorithm is an inappropriate strategy for SIC2 cells with large numbers of variables and 
constraints. 

The exact method is based on a substantial simplification of the problem defined by (1) and 
(2) that can be achieved by exploiting its structure. In particular, we divide the problem into 
two main steps that can be solved iteratively. At the first step, (1) is minimized with respect 
LOVgstBor ds i2sexhies G, conditional on values for all second-phase sampling fractions. This 
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Table 1 
Results for Direct and Exact Methods 


SIC 2 No. of No. of Cost ($) - Cost ($) - 
variables constraints direct exact 
30 62 86 Si553* 1897 
35 37 51 551 512 
39 38 50 1667 1450 
427* 39 48 Die 3383 


* Three digits of SIC are used for first-phase stratification for construction industries. 
** The IMSL routine terminated with an internal error that could not be rectified after consulting published 
documentation. 


step requires the use of nonlinear optimization techniques. The second step involves minimizing 
(1) with respect to the second-phase sampling fractions, conditional on the values of the first- 
phase sampling fractions obtained in the first step. No iterations are required for this minimiza- 
tion, since it has a closed form solution. Furthermore, it can be done independently for each 
h = 1,2, ..., H. After completion of the second step, the first step is repeated and the iterative 
process continued. Convergence is declared when-changes in the cost function between 
consecutive eoraans are small. 

Let vat and vd) ‘) denote the estimates of the optimal values of v, and v,, obtained after i 
be arah (each een including one repetition of the two steps described above). At the 
beginning of iteration i + 1, the transformation of variables given by X{'t!) = 1/u{/*!) — 1 
is required. This transformation redefines the optimization problem involved in the first step 
of the iteration as a problem with linear constraints and a convex objective function. Such a 
convex programming problem is easier to solve. 


More precisely, each iteration involves: 


(i) Minimization of 
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with respect to X!”, g = 1, 2, ..., G, subject to the constraints 
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(ii) Calculation of vf) = 1/(Xf? + 1),g = 1,2, ..., G. Minimization, independently for 
cath: Aire tb, 2eocernwifhot 


Fy = )y ve) + ven * Non 


with respect to uf), g = 1, 2, ..., G, subject to the constraints 
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where / is considered fixed. 


It will be shown in Section 3 that solution of step (ii) does not require use of numerical 
methods. Therefore, the exact method only requires the solution of a series of convex 
programming problems, each involving only G variables. A convex programming problem is 
much easier to solve than a general nonlinear programming problem. A local solution of a 
convex programming problem is also a global solution. 

Let F“) denote the value of the cost function, (1), obtained using u{ and v!). The F\ 
values form a monotonically decreasing sequence and therefore converge to a limit. Whether 
this limit value and the corresponding sampling fractions give the global minimum depends 
on the starting value. This problem is caused by the geometry of the constraints in (2). In practice 
one should try several starting values to get the best solution. One starting value is given by 
the approximate method, which is described in the next section and does not require iterations. 


3. APPROXIMATE METHOD 


In this section, an allocation method that gives an approximation to the optimal allocation 
is described. The method was first suggested by Choudhry, Lavallée and Hidiroglou (1989a). 
Assuming that all the second-phase sampling fractions are equal to one, an approximation to 
the optimal allocation of the first-phase sample is calculated. Then the second-phase sample 
is allocated, conditional on the first-phase sampling fractions. Since the cost of sampling a 
unit in both phases of sampling does not depend on the stratum in which the unit falls, 
minimizing cost is equivalent to minimizing sample size at each step of this method. 

At the first step of the method, an approximate solution to the optimal allocation problem 
for a one-phase sample design is calculated. This step involves finding the minimum, 
independently for each A, of 


FO = SV Dye Ni (3) 


& 


with respect to Ugins & = 1, 2, ..., G. The notation Upjn 18 used to denote the fact that a 
sampling fraction for size stratum g is determined subject to only one precision constraint, 
namely the constraint for SIC4 domain h, where / is fixed. In particular, the minimization 
must be done subject to the constraints 


1 
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One can show that the minimum of (3) is obtained when (4) holds with equality, so that 
the problem defined by (3), (4), and (5) is equivalent to finding the critical point of the lagrangian 
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Setting the derivatives with respect to v,), equal to zero yields 
sin = CAs Bey) NGA (AN) oe le ee (6) 


Setting dL/d\ = 0 we obtain 
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After substitution of (7) into (6), we obtain the optimal sampling fraction for size stratum g 
given only one precision constraint, for SIC4 domain h, 


Uelh = ( (Ag Ar Byn)/ Ng)” ‘ 


YC Aga Bja) wi: Gea Wee By) (8) 
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If one or more of the sampling fractions given by (8) are greater than one, one can set them 
equal to one and solve a modified allocation problem with a reduced number of strata. This 
approach corresponds to the overallocation procedure discussed by Cochran (1977). It is 
necessary to calculate (8) fori = 1,2, ..., H. The approximate first-phase sampling fraction 
for size stratum g, ué, is set equal to the largest value in the set {ugj,, 4 = 1,2, ..., H} for 
g = 1,2, ..., G, an approach that ensures that the precision constraint for each SIC4 domain 
will be satisified. 

Given first-phase sampling fractions, optimal second-phase sampling fractions can be easily 
determined. Assume that, for the SIC2 x province cell h, the size strata included in the 
allocation problem correspond to a set of integers, I. We set the second-phase sampling 
fractions equal to one for those size strata that are not included in the allocation problem. 
Normally, one would have’ = {1, 2, ..., G} but because of overallocation during alloca- 
tion of the second-phase sample, for example, I’ may not include all integers between | and G. 
The problem of allocating the second-phase sample is equivalent to the problem of finding 
the minimum of 


Fy, ~~ yy Ugh * Ug . Neh (9) 
ger 


with respect to u,,, g€I’, subject to the constraints 
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where 


1 
My = Ch Yi Sy (22-1) - gn + Be. 
&§ 
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Note that the expected number of units with SIC4 h in the second-phase sample for size 
stratum g, vg - Nz, is employed in (9). It is easy to show that (9) attains a minimum when the 
constraint (10) holds with equality. Consequently, the minimization problem is equivalent to 
finding the critical point of the lagrangian 


1 A 
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with respect to and u,y, g€I’, and , subject to the constraints 
Oi Ugh =e ger’. 
Setting the first derivatives of L; equal to zero and simplifying, one obtains 
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Note that there is no solution to the allocation problem unless Dy, is positive. Substituting (13) 
into (12) yields 


UE, = (Agn/Ngn)? + C/ug) © Y) (Nga > Ag)? / Den. (14) 
ger 


If vz, is greater than one for certain gh, the overallocation procedure described above can 
obviously be employed. Note that (14) also provides the solution for step (ii) of each exact 
method iteration. 


4. EMPIRICAL STUDY 


The approximate method serves two purposes. First, it provides a good starting value for 
the exact method. Second, it may be easier to implement in practice. In this section, we report 
the results of an empirical comparison using data from the province of Quebec for tax year 
1988. Results obtained using the exact method with various starting points, as well as the 
approximate method, are reported. Since the quantities N,,, Y, and Sin required by both 
methods were unknown, estimates based on the data were used. 
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The size stratification used by the survey, including four take-some strata and one take-all 
stratum, was employed. Allocations were computed for 64 SIC2 cells (all of the Quebec data 
excluding a few small SIC2s). The number of sampling fractions determined in these allocations 
ranged from 8 to 92 with a median of 24. The number of constraints ranged from 9 to 115 
with a median of 31. There were 20 SIC2 cells involving more than 35 variables and 18 of these 
cells also involved more than 50 constraints. A total of 1850 second-phase strata including about 
230,000 population units were involved. 

The first-phase sampling cost, corresponding to the cost of microfilming or photocopying 
a tax return at Revenue Canada, sending the information to Statistics Canada and determining 
an SIC4 code, was set at $1.40 per unit. The second-phase sampling cost, corresponding to 
the cost of transcribing values for financial variables, was set at $7.00. These costs are 
comparable to those incurred during operation of the actual survey. 

Allocations were computed using the exact method with three starting values: I - solution 
of the approximate method; II - all first-phase sampling fractions set to one with the corre- 
sponding conditionally optimal second-phase fractions; and III - a randomly chosen set of 
feasible first-phase sampling fractions, with the corresponding conditionally optimal second- 
phase fractions. In addition, the exact method was started at a perturbation of each of these 
starting values. The perturbed value for the first-phase sampling fraction for size stratum g 
for starting value I was v{° = 0.1 + 0.9 - u%, where v# is the solution of the approximate 
method. Second-phase sampling fractions were started at values that are optimal, conditional 
on the perturbed first-phase fractions. Starting value III was perturbed analogously. The 
perturbed value corresponding to starting value II was uf = 0,1 + 0.9 - vif, where ver is 
optimal, conditional on a census at the first phase of sampling. For each starting value, the 
best result obtained using either the value itself or the corresponding perturbed value was 
retained. Convergence was declared if the absolute relative change in the cost function between 
consecutive iterations was less than 10~*. The IMSL implementation of Schittkowski’s 
successive quadratic programming algorithm was used to solve nonlinear programming 
problems. 

Results are reported in Table 2. Total costs for four alternatives are given. In addition, the 
number of SIC2 cells for which each starting value for the exact method produced better results 
than alternative starting values is shown. Computing costs are not reported, since they were 
small enough to be inconsequential. 

The results indicate that the approximate solution provided the best starting values for the 
exact method. Although starting value II produced better results than starting value I for 17 
SIC2 cells, the total cost associated with starting value II was higher than the total cost for 
the approximate method. The exact method performed poorly when starting values were 
determined by random selection of a feasible set of first-phase sampling fractions. 


Table 2 
Results for Exact and Approximate Methods 


Exact - Starting value 


Method Approximate 
I I] Ill 

Total cost ($) 122779 139347 200998 130228 

No. cells with best result* 48 17 1 


* For two cells starting values I and II produced the same result, which had lower cost than the result obtained using 
Starting value III. Consequently, the numbers reported in this row of the table add to 66 rather than 64. 
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Although the total cost using the exact method with starting value I was only 5.7% lower 
than the cost of the approximate method, it should be noted that the exact method with starting 
value I can do no worse than the approximate method. The exact method with starting value I 
produced better results than the approximate method for 42 cells. 


5. CONCLUSION 


A sample allocation problem for two-phase survey designs is formulated as a constrained 
optimization problem in Sections | and 2. If the numbers of variables and constraints involved 
in the problem are small, the solution can be obtained through direct application of numerical 
methods. However, the direct approach does not work well for large numbers of variables and 
constraints. 

By exploiting the mathematical structure of the problem, it can be divided into two sub- 
problems: the first is a convex programming problem with linear constraints that involves a 
much smaller number of variables, and the second can be solved without the use of numerical 
methods. The algorithm proposed in Section 2 consists of iterations between the two 
subproblems. It is computationally simpler and more effective in practice than the direct 
approach for problems involving large numbers of variables and constraints. An approximate 
solution to the sample allocation problem that does not require use of numerical methods is 
proposed in Section 3. The empirical study in Section 4 shows that it works especially well as 
a starting value for the algorithm proposed in Section 2. 
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The Role of the Interviewer in Survey Participation 


MICK P. COUPER and ROBERT M. GROVES! 


ABSTRACT 


Using data from a survey of U.S. Census Bureau interviewers, this paper examines whether experienced 
interviewers achieve higher response rates than inexperienced interviewers, controlling for differences 
in survey design and attributes of the populations assigned to them. After demonstrating that the 
relationship is positive and curvilinear, it attempts to explain the mechanisms by which experienced inter- 
viewers achieve these rates and elaborate the nature of the relationship. It examines what behaviors and 
attitudes underlie the higher success, with the hope that they might be instilled in trainees. 


KEY WORDS: Interviewers; Nonresponse; Response rates; Survey participation. 


1. INTRODUCTION 


Survey methodologists have long suspected the interviewer to be an important source of 
variation in response rates. Indicators of this include observed differences among trainees in 
the ability to absorb and put into practice the interviewing guidelines, interviewer variation 
in item missing data rates, individual interviewers’ response rates, and the ability of some inter- 
viewers to convert the initial refusals of others. However, several of these indicators are affected 
by the fact that interviewers often do their work in different subpopulations, and thus face 
different challenges to complete their assignments. 

Much of what we believe about the impact of the interviewer on survey participation remains 
untested or inconclusive. In an oft-cited study, Durbin and Stuart (1951) found experienced 
interviewers to be ‘“‘decidedly superior’’ to student volunteers in terms of response rates. Groves 
and Fultz (1985) found that novice interviewers (1 to 6 months of tenure) had the highest refusal 
rates in a telephone survey. In a study cited by Inderfurth (1972), nonresponse rates for Census 
Bureau interviewers trained in 1962 and 1963 declined steadily over the first months of service, 
reaching the level of experienced interviewers after 22 months. In contrast, Singer, Frankel 
and Glassman (1983, p. 74) found the effect of experience on response rates in a telephone 
survey to be counter-intuitive, that is, more experienced interviewers did not achieve higher 
response rates. They do note, however, that this result is based on only six interviewers. In 
a study of 16 field interviewers in Sweden, Schyberger (1967) found nonresponse rates to be 
higher for experienced than for newly recruited interviewers. In short, the common belief of 
experienced interviewers being more successful is not uniformly supported empirically. 

This paper examines the role of various interviewer characteristics, particularly experience, 
in achieving respondent cooperation. It should be noted that the interviewer represents only 
one part of a large set of factors that can affect survey participation. Such factors include 
respondent characteristics, the respondent-interviewer interaction, survey design features, and 
contextual and situational factors. For a review of these factors, see Groves, Cialdini and 
Couper (1992). 


' Mick P. Couper and Robert M. Groves, U.S. Bureau of the Census and University of Michigan. Room 2315-3, 
Bureau of the Census, Washington, DC 20233. 
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We should also note that different models may be more suitable for different components 
of nonresponse. For instance, interviewer motivation, tenacity and effort expended may be 
more important in reducing noncontacts, while persuasion skills play a greater part in the refusal 
component of nonresponse. The data analyzed here do not permit us to distinguish between 
these components of nonresponse. This may weaken the explanatory power of the models tested. 

In this paper we will address two questions: (a) do experienced interviewers achieve higher 
response rates? (b) if so, what are the mechanisms underlying the relationship between 
experience and rates? These questions are important to the survey research community. If the 
behaviors used by successful experienced interviewers can be taught to inexperienced inter- 
viewers, then their success might be transferred to the new recruits. If not, then the value of 
reducing turnover among experienced interviewers remains high for survey organizations. 


2. TOWARD A MODEL OF SURVEY PARTICIPATION 


A number of interviewer characteristics can be identified that have a potential impact on 
survey participation. These are illustrated in Figure 1. The effects of interviewer experience, 
expectations and behavior on response rates, controlling for assignment area and survey design 
features, will be explored. Each of the sets of variables will be discussed in turn. 


2.1 Interviewer experience 


First, interviewers’ experience is expected to have a positive effect on the response rates they 
obtain. This stems from lessons learned through trial and error application of alternative tech- 
niques over time, and from alternative training guidelines and experiences on different surveys. 
Experience thus has two components: length and breadth. Length of experience might be 
indicated by the number of years a person has worked as an interviewer. One indicator of 
breadth of experience is the number of different organizations an interviewer has worked for, 
or the number of different kinds of studies an interviewer has worked on. It is argued that length 
and breadth of experience both serve to increase the variety of different interviewing situa- 
tions to which an interviewer is exposed. 

We expect the relationship between length of experience (as measured by tenure) and response 
rates to be curvilinear. Experience in the first few years of interviewing will have a greater impact 
on response rates than in later years. After a certain point, the number of new situations faced 
by interviewers declines, and interviewers become comfortable dealing with the wide variety 
of sample persons and assignment areas they may face. After this, additional years of experience 
may not produce further gains in response rates. 

An alternative hypothesis is that self-selection rather than experience produces higher 
response rates among interviewers with longer tenure. In other words, it is not that individual 
interviewers get better over time, but that better interviewers tend to stay, while weaker 
interviewers leave the job. We believe that a combination of these two factors explains varia- 
tions in interviewer performance. However, the self-selection hypothesis cannot be tested in 
a cross-sectional study such as this, and caution must be exercised in drawing inferences from 
these analyses. 

If experienced interviewers achieve higher response rates, we hypothesize that this takes place 
through the intervening effects of interviewer expectations (e.g. confidence) and behavior (e.g. 
effective oral presentation). Note that we posit no direct effect of experience on response rates. 
In other words, is it possible to identify interviewer attitudes and behaviors that may account 
for possible differences in response rates? 
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2.2 Interviewer expectations 


It is hypothesized that positive interviewer expectations lead to higher response rates. Inter- 
viewers who have a greater belief in their ability to persuade sample persons to participate, 
who believe in the legitimacy of the work they are doing, and who are confident that most people 
agree to participate in surveys, are likely to get higher response rates than those who believe 
otherwise. This argument has some empirical support in the study by Singer, Frankel and 
Glassman (1983), in which it was found that interviewers who anticipated prior to the survey 
that the task of persuading respondents was ‘‘moderately easy’’, achieved higher response rates 
than those who believed the task to be ‘‘moderately difficult’’. 


2.3. Interviewer behavior 


With regard to interviewer behaviors, we seek to identify the mechanisms by which greater 
experience and positive expectations translate into higher response rates. The behavior of 
interviewers in gaining cooperation from sample persons may be likened to that of other 
‘*compliance professionals’’ (such as salespersons, fundraisers, efc.). Based on an extensive 
review of experimental and observational evidence, Cialdini (1984, 1990) identifies six 
compliance principles used to decide whether to accede to a request. Briefly, these principles 
are as follows: 


(a) Reciprocation: One should be more willing to comply with a request to the extent that the 
compliance constitutes the repayment of a perceived gift, favor, or concession. 


(b) Consistency: After committing oneself to a position, one should be more willing to comply 
with requests for behaviors that are consistent with that position. 


(c) Social validation: One should be more willing to comply with a request to the degree that 
one believes that similar others would comply with it. 


(d) Authority: One should be more willing to yield to the requests of someone who one perceives 
as a legitimate authority. 


(e) Scarcity: One should be more willing to comply with requests to secure opportunities that 
are scarce. 


(f) Liking: One should be more willing to comply with requests of liked others. 


We are interested in the extent to which interviewers make use of these principles to persuade 
sample persons to participate in a survey. 

It is argued that interviewers who make appropriate use of each of these strategies are likely 
to have greater success in persuading reluctant sample persons to participate. However, the 
use of such techniques indiscriminately in all situations may backfire. For example, the invoca- 
tion of the authority principle in areas where suspicion of government is high may well have 
a negative effect on cooperation. The use of these compliance principles may not be univer- 
sally effective in all situations or for all sample persons. 

Thus, it is not just whether these techniques are used by interviewers, but also how they are 
used. Two concepts are of interest here. One is the number of different techniques that an inter- 
viewer has at his/her disposal, and the second is how appropriately such techniques are applied. 
The first we will refer to as the ‘‘repertoire of techniques’’ available to the interviewer. A novice 
interviewer may learn one or two ‘‘canned’’ introductions during training, and use them on 
all sample persons he/she encounters. In contrast, the experienced interviewer has a wide reper- 
toire of approaches upon which to draw, and can apply them as the situation warrants. 
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The second concept is that of appropriate application of the skills or techniques at the inter- 
viewer’s disposal. We refer to this as ‘‘tailoring’’. An interviewer is expected to be an “‘astute 
psychological diagnostician’’ (Cannell 1964), to be able to size up a situation quickly, and apply 
the appropriate persuasive messages. These skills are gained through experience, either on the 
job or in life in general. The novice interviewer, with fewer skills and less confidence, may rigidly 
adhere to a small number of ‘‘tried and trusted’’ approaches. The experienced interviewer is 
better able to tailor his/her approach to each potential respondent. 

It may be that adaptability and appropriate application of persuasive techniques are more 
critical than the actual behaviors or techniques themselves. If so, it should be possible to develop 
a more parsimonious model using only the latter concepts and dropping the specific behaviors 
measured. 


2.4 Assignment area 


To examine the effect of interviewers on survey participation, we need to take into account the 
fact that they are assigned different areas to interview. Ideally, the research design would have 
randomly assigned interviewers to sample areas, removing any statistical confounding between 
interviewer and population characteristics. Without such randomization, we attempt to specify 
those population characteristics important to response rate and statistically control for them. 

First, the problem of obtaining cooperation from sample persons in inner-city areas is well 
known (see Steeh 1981, Smith 1983). House and Wolf (1978) found that rising crime rates, 
particularly in high density urban areas, have been a major deterrent to survey participation, 
and to trusting and helping behavior in general (Korte and Kerr 1975). We expect this arises 
both because of residents’ reluctance to interact with strangers, and unease among interviewers 
on entering these neighborhoods. 

Turning to characteristics of sample households, household size has been found to correlate 
positively with response rates (see Gower 1979; Paul and Lawes 1982; Rauta 1985). Single- 
person households tend to have relatively high refusal rates (see Brown and Bishop 1982; Wilcox 
1977). This may be due in part to the large proportion of elderly persons living alone. Families 
with dependent children, on the other hand, tend to have higher response rates. Lievesley 
(1988) notes that higher response rates in certain areas of the U.K. may be explained by 
the high probability of finding someone at home arising from high proportions of children 
aged 0-4. 

The findings on sample person characteristics are somewhat more mixed. A number of 
researchers (see Brown and Bishop 1982; Hawkins 1975; Herzog and Rogers 1988; Weaver 
1975) have found age to be associated with nonresponse. The impact of other sample person 
characteristics such as race, education, socio-economic status, gender, efc. are somewhat incon- 
sistent (see Groves (1989) and Goyder (1987) for reviews of these factors). 


2.5 Survey design features 


Finally, survey design features (topic, burden, respondent selection rules, efc.) are likely 
to influence a sample person’s decision to participate, both directly and in terms of constraints 
on interviewer expectations and behavior. 


2.6 Interaction effects on response rate 


We suspect that there may be a number of statistical interaction effects of influences on 
nonresponse. One question is whether there are some areas (such as high density central city 
areas) in which interviewer experience is more important than other areas. For example, high 
density urban areas may be more diverse, requiring greater experience to deal with a greater 
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variety of different situations. Behavior in areas where the situations presented to interviewers 
are all very similar could be more easily learned, as fewer persuasion strategies would be needed. 

We also suspect that different surveys may obtain varying response rates for different 
subpopulations as a result of the differential salience of the survey topic to such groups. For 
example, it may be expected that the National Crime Survey (which focuses on criminal 
victimization) may get higher response rates in high crime areas than in low crime areas. 
Similarly, the National Health Interview Survey (which measures health-related activities) may 
obtain higher response rates in areas with an older than average population. Similar interactions 
may be expected between the Consumer Expenditure Survey and such variables as average 
household size and income level. 


3. METHOD 


3.1 Data collection strategies 


The results in this paper are part of a larger study of survey participation in face-to-face 
surveys in the United States. The first part of the work involved a series of focus groups with 
interviewers working on a variety of different surveys around the country. The insights gained 
from these groups led to the development of a structured questionnaire to test some of these 
hypotheses on a larger audience of interviewers. 

The interviewer surveys had the goal of measuring behavioral, experiential and attitudinal 
influences on levels of cooperation obtained by interviewers. The questionnaire was developed 
and tested by staff at the Survey Research Center in collaboration with staff from the U.S. 
Census Bureau. 

This questionnaire was administered to U.S. Census Bureau interviewers working on the 
following three personal visit surveys: 


(a) the Consumer Expenditure Quarterly Survey (CE), sponsored by the Bureau of Labor 
Statistics; 


(b) the National Health Interview Survey (HIS), sponsored by the National Center for Health 
Statistics; and 


(c) the National Crime Survey (NCS), sponsored by the Bureau of Justice Statistics. 


The questionnaire was mailed in February, 1990, to Census Bureau interviewers working 
on these three surveys. All interviewers were paid their normal salary rate for completing the 
questionnaire (most were paid for an hour of their time). In an effort to seek candid responses 
and eliminate the threat of supervisory intervention, interviewers were assured that their indi- 
vidual responses would not be seen by or discussed with any of their supervisors, and that the 
results would be reported only as statistical totals. 

Questionnaires were mailed back to the central office. Reminder letters and telephone calls 
were used to increase the response rate. A total of 1,013 completed questionnaires were received, 
representing a response rate of 97.1%. A number of questionnaires were excluded from the 
analyses reported here. All supervisory interviewers (256) were excluded. These people often 
have no regular assignments of their own, and typically work on a number of different surveys. 
They are often used for refusal conversion, or to “‘clean up’’ otherwise incomplete assignments. 
With supervisory interviewers excluded, transfer of assignments from one interviewer to another 
on these surveys is rare. For purposes of calculating interviewer-level response rates, each 
nonresponse case was counted against the original interviewer, regardless of whether it was 
later converted by another. In addition, those interviewers who started work during the period 
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in which the interviewer survey was administered, and for whom no historical response rate 
information was available, were also excluded (46 interviewers). This left a total of 711 inter- 
viewers, 207 from CE, 139 from HIS and 365 from NCS. The numbers of cases included in 
the analyses may be further reduced due to missing data on certain variables. 


3.2 Data structure 


In addition to the questionnaire responses, other variables were added to the data file. These 
included a set of variables to represent each interviewer’s assignment area. Typically, the 
primary sampling unit (PSU) in which an interviewer works consists of one or more coterminous 
counties. County-level data were extracted from the County and City Data Book (Bureau of 
the Census 1988), aggregated to the PSU level, and attached to the interviewer records. Note 
that these variables can only reflect gross differences in assignment area and cannot, for 
example, distinguish between central city and suburban areas. 

The date each interviewer was hired by the Census Bureau was obtained from administrative 
records to create a variable to serve as a measure of tenure. Although it does not indicate length 
of experience on a particular survey, it does reflect the length of time an interviewer was 
employed by the Census Bureau. 

A major drawback of this study is that it was not possible to obtain measures of race, age, 
gender, or other demographic attributes of the interviewer. Confidentiality restrictions 
prevented access of personnel records for this information, nor could these be asked in the 
interviewer questionnaire. 


3.3 Analytic plan 


Three different surveys are represented in the data set. Instead of introducing control 
variables measuring key design features of the surveys, dummy variable indicators of the survey 
were used to control on important design differences among them. 

The dependent variable is aggregate response rate for the six month period, October 1989, 
through March 1990. It was not possible to obtain interviewer-level data on the components of 
nonresponse (particularly refusals) for this period. These rates thus do not distinguish between 
noncontact and refusal components of nonresponse. Hence, it should be noted that the analyses 
reported here are based on interviewer-level response rates rather than refusal rates. 

The nonresponse rates for the three surveys for 1990 (based on national sample totals) are 
presented in Table 1. 

Refusals as a proportion of total nonresponse varies from 87% for CE to 52% for NCS. 
We suspect that different sets of factors operate to affect these two components of nonresponse. 
Ideally, separate models would be fitted for each component, but this was not possible given 
the current data. To the extent that factors affecting refusals are different from those affecting 
other components of nonresponse (such as noncontacts), the results will be confounded (see 
Lievesley 1988). It can also be seen that nonresponse rates for these three surveys are low to 
begin with. This may further restrict the ability of these models to explain differences among 
interviewers. 

Given that the size of the interviewer assignments vary (and hence affect the variance of 
the measured individual response rates), we used weighted least squares (WLS) with assignment 
size as the weight. Comparisons of the WLS results with those using ordinary least squares 
(OLS) solutions were made, and it was found that WLS reduces the size of the coefficients 
marginally, but does not affect the sign or relative strength of the coefficients. All the analyses 
reported here are based on the WLS solutions. 
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Table 1 
1990 Nonresponse Rates for Three Surveys 


Nonresponse Refusal 
Survey 
rate rate 
% % 
Consumer Expenditure Survey 13.4 11.6 
Health Interview Survey 4.5 2.8 
National Crime Survey a] 1.6 


A series of tests were performed to determine the appropriateness of the models specified. 
A number of outliers in the dependent variable were detected. However, removal of these 
outliers had little or no effect on the results obtained, and they were therefore retained in 
all analyses. Tests of the normality assumption were also conducted. The normal probability 
plots show that the residuals from these models do not differ markedly from a normal 
distribution. 

It is hypothesized that the effect of tenure on response rate is greater in the first few years. 
The tenure variable is transformed (the natural log is used) to reflect this. The transformed 
variable indeed produced an improvement in fit over the linear tenure variable. 

A more detailed description of the variables used in these analyses can be found in 
Appendix A. 


4. LIMITATIONS 


Before describing the analyses, it is important to note some of the limitations of these data. 
First, these findings refer only to interviewers working on three ongoing national surveys at 
the Census Bureau at the time at which the interviewer survey was conducted. It is not possible 
to generalize to other face-to-face or telephone surveys conducted by academic or private sector 
organizations. 

Furthermore, the data are cross-sectional in nature. Cohort and period effects are confounded 
with the effects of experience. That is, any observed response rate differences by interviewer 
experience may be due to changes in the quality of interviewers hired over time, in the effec- 
tiveness of interviewer training over time, or in differential turnover by interviewer quality. 
Hypotheses can be constructed to support both positive and negative effects of these factors 
on response rates. Hence, the measured impact of interviewer experience on response rates 
is a complex combination of these factors. Longitudinal measurement of interviewers is needed 
to disentangle these effects. 

Interviewers are not randomly assigned to areas. Although we have attempted to control 
for a number of characteristics of assignment area that may impact on response rates, there 
may be many other factors that could explain differences in response rates across assignment 
area. Further, we are limited to weak controls, on attributes of counties and groups of counties, 
not on attributes of specific assignment areas within counties given to interviewers. A hierar- 
chical analysis containing data on individual respondents and interviewers assigned to them 
would improve these control factors. 
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Finally, the dependent variable was measured for a time period up to and including the 
administration of the interviewer questionnaire. More recent response rate data were not 
available at the time. Given that behaviors and expectations were not measured before the 
response rates were obtained, caution should be exercised in attributing causality. 

Despite these limitations, these data provide us with the opportunity to test prevailing beliefs 
about the role of interviewer experience in response rates, and to explore the role of interviewer 
expectations and behavior in face-to-face surveys. 


5. RESULTS 


First, we measured the impact of experience, controlling for characteristics of assignment 
areas and dummy variables for the surveys (Model 1 in Table 2). Let us first examine the coef- 
ficients of the control variables. With few exceptions, most of the assignment area variables 
have a significant impact on response rates. Both population density and crime rate act as 
expected, with lower response rates being obtained in high crime, high density areas. The 
negative effect of household size is contrary to expectation. This may be explained in part by 
the fact that these surveys all collect information from or about all adult household members, 
thereby increasing the reporting burden for large households. This is contrary to many surveys 
where a single adult is selected from each household. The effect of age is as hypothesized, with 
response rates tending to be lower (but not significantly so) in areas with larger proportions 
of persons over 65, but higher in areas with many households who have young children. 

The large effects for the two survey variables (relative to the omitted category of the 
Consumer Expenditure Survey) reflect differences in the mean response rates for these three 
surveys. Such differences can be attributed to a host of survey design differences (length of 
the interview, respondent selection rules, panel versus cross-sectional designs, content of the 
questionnaires, efc.) that are beyond the scope of this paper. Nevertheless, it is clearly necessary 
to control for these differences. 

Now, let us examine the measured effect of experience, given these control variables. It can 
be seen that tenure has a strong positive effect on response rates, even when controlling for 
the nature of the area to which an interviewer is assigned. This appears to confirm prevailing 
beliefs about the role of interviewer experience. Interviewer differences in response rates appear 
to be more than simply artifacts of differences in the areas to which they are assigned, and 
experience plays a key role in such interviewer differences. 

The inclusion of an indicator for breadth of experience was also tested, but found to have 
no significant effect in the presence of the remaining variables. It thus appears that, for Census 
Bureau interviewers at least, experience working for other survey organizations does not appear 
to have any marginal impact on response rates over and above that of tenure. 

Does tenure have a differential impact on response rates in different assignment areas? 
Model 2 in Table 2 includes an interaction term between the log of tenure and population 
density. An additional interaction term between tenure and crime rate was also tested, but this 
coefficient was found to be insignificant, and the interaction had little impact on remaining 
elements of the model. The interaction term in Model 2 is statistically significant, but the sign 
is opposite to that expected. We hypothesized that experience would have a greater impact in 
high density areas, but this does not appear to be the case. An alternative explanation may 
bea “‘burnout effect’’. More experienced interviewers in high density urban areas may be losing 
their enthusiasm sooner than experienced interviewers in less stressful rural areas, and this 
contributes to lower response rates. Interviewer burnout may be one factor contributing to 
higher turnover rates in the large metropolitan areas. 
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Interactions between the three surveys and various assignment characteristics were also tested. 
None of these appear to have any noticeable effect in these models, and are not discussed further. 
As a further test for the presence of additional interactions involving the survey variables, 
separate models were fitted for each of the three surveys. The models obtained are essentially 
the same for each of the three surveys examined. Thus, although the level of response differs 
across the three surveys, the relative impact of tenure on response rates appears to be the same. 

Given that it appears that experienced interviewers achieve higher response rates regardless 
of the areas to which they are assigned, we can proceed to address the question of how experience 
impacts on levels of cooperation. What makes a more experienced interviewer better at gaining 
cooperation from respondents? 

The first step involves the addition of interviewer expectation variables to Model 2. The 
results are presented as Model 3 in Table 2. All three expectation variables act in the expected 
direction, although only one achieves statistical significance at traditional levels. It appears 
that those interviewers who have a greater belief in their ability to convince reluctant respondents 
to participate, actually achieve higher response rates. 

It should be cautioned that the causal link between expectations and response rates cannot 
be established in a cross-sectional study such as this. It may be that greater success leads to 
greater expectations of future success, rather than the other way around. This interpretation 
opposes the hope that instilling a greater sense of self-efficacy in interviewers will produce 
higher levels of response. Nevertheless, this finding is an intriguing one that demands further 
attention. 

The next step was to add the set of interviewer behaviors into the model. The results can 
be seen in Model 4 in Table 2. Two things can be noted about these results. First, the inclusion 
of this set of interviewer behaviors failed to explain away the effect of tenure. In fact, the 
coefficient for tenure is hardly affected by the addition of either the expectation variables or 
the behavior variables. 

Second, the results for the specific behaviors are somewhat mixed. It was expected that the 
coefficients for all the behavior variables would be positive. This is not the case. The results 
for authority and reciprocation indicate that interviewers who use these techniques achieve 
higher response rates. In contrast, use of the scarcity principle appears to have the opposite 
effect. Pressure on a respondent to meet certain deadlines may well backfire. The remainder 
of the behavior variables do not appear to have a significant effect on the response rates attained 
by Census Bureau interviewers. 

It was suggested earlier that a reduced model, using only repertoire and tailoring, should 
be considered. In Table 2 it was seen that these two variables do not have significant effects 
in the presence of the other behavior variables. Even after removing the other behavior variables 
from the model, repertoire and tailoring still have little impact on response rates. Thus, the 
argument that the way interviewers use various compliance techniques are more important than 
the actual behaviors themselves gains little empirical support from these data. However, the 
measures of these two concepts may be weak, and a better test of their role should be done 
at the contact-level of analysis. 


6. DISCUSSION 


This paper set out to measure whether experienced interviewers achieve higher response rates 
than inexperienced interviewers. It found they do. It then tried to explain why they do. It largely 
failed. One reason may be that the model is incorrect. However, continued discussions with inter- 
viewers and supervisory staff lead us to believe that this theoretical formulation has some merit. 
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Four explanations can be posited. First, the model is being tested at the wrong level of 
aggregation. Although the questionnaire focused on what interviewers usually or typically do, 
we are more interested in how they act in specific situations. A more appropriate test of these 
ideas should be conducted at the contact or household level. Second, the measurement of 
various concepts may be inadequate. Improvements in the translation of concepts from the 
compliance literature into specific interviewer behaviors may be made. Third, it should again 
be noted that these models deal with response rates not refusal rates. It may be that certain 
behaviors are more appropriately directed at persuading sample persons to participate (aimed 
at reducing refusals), while others may serve more to gain access to sample persons (the non- 
contact portion of nonresponse). Separate models for these two processes could not be devel- 
oped here. Finally, other unmeasured characteristics of interviewers (appearance, voice quality, 
dress, etc.) may also play a role in influencing the respondent’s decision. 

These possible shortcomings do not negate the role of these behaviors in affecting response 
rates. Rather, the findings suggest further research and analysis to explore the relationships 
between specific behaviors and their application on the one hand, and interviewer-level response 
rates on the other. We feel that this line of inquiry has merit, and are working toward a fuller 
understanding of the role of interviewer experience, expectations and behavior in survey 
participation. 
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APPENDIX A 
VARIABLES USED IN ANALYSES 


The creation of the variables used in the analyses are summarized here. Copies of the 
questionnaire can be obtained from the authors. 


Dependent variable 


Response rate: This is the response rate obtained by each interviewer for the six-month period 
in question, expressed as a percentage. 


Assignment area 


Population density: Population density (persons per square mile). 

Crime rate: Crime rate (crimes per 100,000 population). 

Percent 65 or older: Percentage of population 65 years of age and older. 
Percent under 5: Percentage of population under 5 years of age. 


Household size: Average household size. 
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Survey 


Set of dummies to indicate which survey each interviewer works on: 


HIS: Does interviewer work on the Health Interview Survey. 


NCS: Does interviewer work on the National Crime Survey. 


Lees 
Gr= No 
1 = ‘Yes 
0“='No 


CE: (the Consumer Expenditure Survey) is thus the omitted category. 


Interviewer experience 


Tenure: 


Measured in days of service employed at the Census Bureau as an 
interviewer, rescaled to fractional years. 


Breadth of experience: A count of the number of different survey organizations for which 


an interviewer has worked. 


Interviewer expectations 


Confidentiality: 


Rate/quality: 


Efficacy: 


Interviewers were asked whether they thought there were any situation under 

which the Census Bureau would give individual survey response to any of 

a number of agencies (FBI, CIA, INS, IRS, state and local government 

agencies). 

1 = High confidentiality belief (Census Bureau would not give responses 
to any of these agencies). 

0 = Lowconfidentiality belief (Census Bureau would give responses to one 
or more of the agencies). 


Trade-off between response rate and data quality. Which one of the 
following statements comes closest to how you feel as an interviewer: 


1 = It’s better to persuade a reluctant respondent to participate than to 
accept a refusal. 
0 = It’s better to accept a refusal from a reluctant respondent. 


Interviewers were asked the extent to which they agreed or disagreed with 
the following statement: With enough effort, I can convince even the most 
reluctant respondent to participate. 


Four-point ordinal scale, 1 = strongly disagree, 4 = strongly agree. High 
score indicates greater belief in self-efficacy. 


Interviewer behaviors 


Authority: 


Interviewers were asked how often they left various materials (request for 
appointment, copy of the advance letter, efc.) at respondents’ home when 
they found no-one at home. The responses to these questions were combined 
to form a scale of frequency of use of these authority-enhancing materials. 
High score indicates greater use of authority. 


Pat i) 


Reciprocation: 


Social proof: 


Saliency: 


Scarcity: 


Consistency: 


Repertoire: 


Tailoring: 
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How often do you make a point of complimenting something about respon- 
dent’s home or personal appearance? 
1 Always, sometimes 


0 


Rarely, never 
How often do you say ‘‘Most people enjoy doing the interview’’? 


1 = Always, sometimes 


0 = Rarely, never 


How often do you explain to respondents how the survey results could affect 
them personally? 


1 
0 


Always, sometimes 


Rarely, never 


How often do you tell a respondent that the interview must be completed by 
a certain date? 


1 
0 


Always, sometimes 


Rarely, never 


Before a respondent has shown any sign of cooperating, how often do you 
begin asking the survey questions? 


1 
0 


Always, sometimes 


Rarely, never 


In an open-ended question, interviewers were asked to list all things they 
usually do to persuade reluctant respondent to participate. A count of the 
number of distinct things mentioned serves as an indicator of the repertoire 
of techniques available. 


Ina series of 15 behavior items, interviewers responded whether they always, 
sometimes, rarely or never performed such behavior. An indicator of tailoring 
in the application of various persuasion techniques is obtained by counting 
the number of times an interviewer used the middle categories (sometimes 
or rarely) to these questions. A high score indicates greater use of tailoring. 
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A Multivariate Procedure Towards Composite Estimation 
of Consumer Expenditure for the U.S. 
Consumer Price Index Numbers 


P. LAHIRI and WENYU WANG! 


ABSTRACT 


We consider the problem of estimating the ‘‘cost weights’’ and ‘‘relative importances”’ of different item 
strata for the local market basket areas. The estimation of these parameters is needed to construct the 
U.S. Consumer Price Index Numbers. We use multivariate models to construct composite estimators 
which combine information from relevant sources. The mean squared errors (MSE) of the proposed and 
the existing estimators are estimated using the repeated half samples available from the survey. Based 
on our numerical results, the proposed estimators seem to be superior to the existing estimators. 


KEY WORDS: Consumer expenditure; Composite estimation; Consumer Price Index; Cost weight; 
Diary survey; Half sample; Laspeyres Index; Mean squared error; Synthetic estimation. 


1. INTRODUCTION 


The U.S. Consumer Price Index (CPI) is an indicator of price changes for a set of items, 
goods and services, whose quantity and quality are fixed over a period of time. The U.S. Bureau 
of Labor Statistics (BLS) computes a number of consumer price indices each month for various 
geographical areas, consumer units and item classification (vide BLS Handbook of Methods 
1988). 

The smallest group of item classification for which the BLS computes the CPI is known 
as an ‘‘item stratum’. It is a prespecified set of consumer goods and services, e.g., fresh whole 
milk, which can be purchased in the retail market during a ‘‘base period”’ by a specified set 
of consumer units. A consumer unit may consist of all members of a particular household 
related by blood, marriage, adoption, or other legal arrangements. A number of item strata 
constitutes an expenditure class (e.g., dairy products). 

The U.S. is divided into eight major areas for sampling purposes. A major area may be either 
‘‘self-representing’’ or ‘‘non-self-representing’’ and belongs to one of the four regions (Northeast, 
Midwest, South and West). A self-representing area consists of all large cities within a region. 
A non-self-representing area generally consists of a county or a group of contiguous counties. 
For publication purposes, a major area is further divided into a number of ‘‘market basket 
areas’’ or ‘‘publication areas’’. 


The Laspeyres formula used by the BLS to compute the CPI for a given area and an expen- 
diture class (say, £) is defined below. Let 


P;, = the average price of all items in the ‘th item stratum at time ¢ (¢ = 0,7), 


the quantity of all items in the /th item stratum purchased at time t = 0 (base period). 


Qio 


! Pp. Lahiri, Department of Mathematics and Statistics, University of Nebraska-Lincoln, Lincoln, NE 68588 0323, 
USA. Wenyu Wang, SUNY Health Science Center at Brooklyn, Box 1203, 450 Clarkson Avenue, Brooklyn, NY 
11203, USA. 
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Then the Laspeyres index at time f = T is given by 


Ir 


> Q0Pr| » QioPio 


icE icE 
ye C,(Pi7/ Pig) 


i€E 
Bee: 


1€E 


= Lice Ri (Pir/Pi) ; 
where 


C; = QioPio = total expenditure for all items in the ith item stratum at ¢ = 0, 


R; = Cj/XiegC; = proportion of total expenditure spent on the /th item stratum at ¢ = 0. 


The quantities C; and R; are referred to as the ‘‘cost weight’’ and ‘“‘relative importance’’ 
of the ith item stratum within the expenditure class, EF. 

The Bureau of Labor Statistics computes the consumer price indices using data from the 
U.S. Consumer Expenditure Survey (CES). The survey has two different components - Diary 
survey and Interview survey, each having separate sampling schemes and questionnaires. In 
this paper we consider data from the Diary survey only. The sampling design selects all the 
primary stage units (PSU’s) within a particular self-representing area with certainty. But only 
a sample of PSU’s is selected for a particular non-self-representing area according to a probability 
sampling scheme. From each selected PSU, a sample of consumer units (CU’s) is selected again 
using some probability sampling design. Each respondent keeps a diary of expenditures on 
various items for two consecutive 1-week periods. For a detailed account on the CPI and CES, 
the reader is referred to the BLS Handbook of Methods (1988). 

The efficiency of the traditional sample survey estimators of the cost weight and relative 
importance of an item stratum at the publication area level is generally very low compared to 
their efficiency at a larger area (e.g., major area) level. This is due to the fact that only a few 
consumer units are available from a given publication area. Thus, there is a need to improve 
the traditional estimator by borrowing strength from related resources. Marks (1978) and Cohen 
and Sommers (1984) considered certain composite estimators which pool information from 
related areas. Ghosh and Sohn (1990) obtained composite estimators of the cost weight and 
relative importance using an empirical Bayes approach. 

The current procedure used by the Bureau of Labor Statistics consists of several steps. First 
composite estimators of the relative importances are obtained using a method suggested by 
Cohen and Sommers (1984). The estimators of the cost weights are then obtained from these 
estimators of the relative importances using an iterated ‘‘raking’’ procedure. The final estimates 
of the cost weights for the entire expenditure class and for the major area are identical to the 
corresponding preliminary estimates. One reason for ensuring this ‘‘data consistency’’ by raking 
may be due to the fact that the performances of the preliminary estimators are generally 
satisfactory at a higher level of aggregation compared to their performances at a lower level. 
At the last step, the final estimators of the relative importances are obtained directly from the 
final cost weight estimators by division. 
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Unlike earlier authors, we use the correlations between the item strata in proposing our 
composite estimators in Section 2. The shrinkage factor of the composite estimator obtained 
by minimizing the mean squared error within an appropriate class of estimators involves some 
unknown parameters. These unknown parameters are estimated using the balanced repeated 
replications available from the survey. The estimator proposed by Cohen and Sommers (1984) 
turns out to be a special case of our estimator if one assumes that the preliminary estimators 
are all uncorrelated. 

In Section 2 we concentrate our attention to the estimation of the cost weight of an item 
stratum for a publication area. However, we can obtain estimators of the cost weights at a 
higher level of aggregation (e.g., expenditure class for a publication area, etc.) by appropriate 
summation. From our study, it turns out that in terms of the mean squared error criterion these 
estimators always perform better than the corresponding preliminary estimators and hence 
better than the BLS estimators (note that due to the raking procedure the BLS estimators are 
identical to the preliminary estimators at higher levels of aggregation). 

In Section 3 we propose a composite estimator of relative importance of an item stratum 
at the publication area level. Instead of using the preliminary estimators of the cost weights 
we use the preliminary estimators of the relative importances for all the item strata belonging 
to the expenditure class under consideration. The preliminary estimators of relative importances 
of all the item strata within an expenditure class add up to unity. Thus, the variance covariance 
matrix of the preliminary estimators is singular and this makes the problem different from the 
problem of estimation of the cost weights. Our procedure deletes one item stratum in an 
optimal manner and thus avoids the problem of singularity of the variance covariance matrix 
of the preliminary estimators. Our numerical results show that in terms of the mean squared 
error criterion the proposed estimator is always the best among all the rival estimators 
considered. 

In Section 4, we present all the numerical results. We have evaluated different estimators 
of the cost weight and relative importance based on estimated mean squared error obtained 
by using the balanced repeated half samples (see McCarthy 1969, Ghosh and Sohn 1990). Based 
on our results, the proposed estimators seem to be superior to all the rival estimators considered 
in the paper. 


2. ESTIMATION OF THE COST WEIGHT 


Let X;;, be the average of two consecutive weeks of expenditure for all the items in the ith 
item stratum by the /th consumer unit belonging to the /th publication area within a particular 
major area (i = 1,...,4;j = 1, ..:,m;l = 1,..., n;). Let W; be the sampling weight 
attached to the /th consumer unit in the /th publication area (j = 1, ..., m;/ = 1, ..., nj). 
This represents a number of consumer units in the population and is obtained by the Census 
Bureau using a complex procedure which takes into account various factors such as inclusion 
probabilities, nonresponse, e¢c. In this section, we consider estimation of 6;;, the true average 
weekly expenditure per consumer unit for the ith item stratum and /th publication area. The 
cost weight is simply defined as N;6;;, where N; denotes the total number of consumer units 
in the jth publication area. The preliminary estimator of 6; is given by 


a if] 
Y=), HX af Seedi = ES ye, ..., m). (2.1) 
l=1 l=] 
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Similarly, the corresponding estimator for the major area is given by 


m nj m ij 
Yur HX PapoaLits (2.2) 


The variability of Y;. is much lower than that of Y;;. Thus, a composite estimator of 6;; 
which increases the precision is needed. Let Y; = (Yj, ..., Yj)’ and@; = (6), ..., Oy)’, 
j = 1, ..., m. Let V; be the true variance covariance matrix of Y;, (j = 1, ..., m). Under 
a synthetic assumption, i.e., 6; = uw, a J xX 1 column vector, (j = 1, ..., m), the best 
estimator of 6; is given by 


m == RL 
eal Wd li op bree chs 2.3) 


which is obtained by minimizing Y/L,(Y; — pw)’ yas (Y¥; — ») with respect to uw. The syn- 
thetic assumption, however, is hardly satisfied. In the other extreme when there is absolutely 
no similarity between the 6;’s, it is appropriate to take Y; as an estimator of 0;. When the real 
situation is in between these two extremes one may take a composite estimator given by 


6,;(a;;) = (1 — ay) Xj + ayefit, CY 


where q;;’s are constants (0 < a;; < 1), e; isa J xX 1 column vector having 1 for the ith 
elements and 0 for the others. 
We obtain a;; by minimizing the mean squared error 


ER Ci =94;,) arias epee 6;;}7 | 6:;] (2.5) 
with respect to a;;. The optimal choice is given by 


fv-(E-)'} 


aj; = = : (2.6) 


El CY = ef) Ojo i Apa ee) 


Thus, the optimal estimator of 6;; in the class described by (2.4) is given by 


GF = (1 = Gj;) ner + Gye. (2.7) 


Remark 1: In the derivation of the optimal estimator 6: the quantities V;, (Gy = 1, ..., m) 


and: 6.1 (Yj; >- ej)? | 6;,j/ = 1, ..., m] are assumed to be fixed and known. 


Remark 2: The estimator proposed by Cohen and Sommers (1984) can be obtained from 6; 
as a special case when 


= 


J —] 
V, -( Wr) Digetc;,fsamasye 
p= 
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Note that according to their assumption the correlation between any two item strata is zero 
which appears to be very restrictive from our study. 


Remark 3: Note that using a familiar matrix inversion result (see Rao 1973), 


iE) solve (Bey ] 


jon Geay] 


which is positive definite. Also, 


=] ] =1 
E((¥,; — ef)?|0,f =1,....m] = ey V, +; is ve") | Vie; 
Seay 


Also, when 6; = p, one gets d;; = 1 and thus 6; = e/j. Otherwise the size of the shrinkage 
factor depends on the size of 


g a ei( yD mn) (2 8) |. 


j=! 


The larger the distance of 6,; from e/( Y7%1V;_')~'( L741 V; '6;) the smaller is the size of a,;. 
This means that if a particular area is very different from the general nature of all the areas 
then our procedure will give less weight on the synthetic part of the estimator. This explains 
the great deal of variation of the shrinkage factors in Table 1. 

We shall estimate d;; using the 20 balanced repeated half samples available from the survey. 
Let wih) denote the weight assigned to the /th consumer unit of the jth area for the kth 
Feplication (71s saps Wile ly ee es Ram Nec 20): These replicated weights are 
constructed by the Census Bureau using a complex procedure. For any replication, approx- 
imately half the consumer units receive zero weights and the remaining consumer units receive 
positive weights. 


Table 1 
Shrinkage Factors @;; in West Non-Self-Representing Area 

LXV OOo O—_—_e_—_—__N—€"_ 

sire! 1 2 2 
1 0.8479225 0.7057626 0.9214804 
Z 0.8434894 0.5692695 0.8092725 
3 0.0969009 0.0786758 0.6953904 

4 0.4446537 0.5444809 | 
5 0.699955 1 0.3460123 0.5487382 
6 0.0318442 0.4981756 0.2598752 


ee See ee EEEEEEEEEEEEOEEE 
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Then we propose the following estimator of 6;;: 
6% = (1 — Gk)Y,, + Ghejp. (2.8) 


Remark 4: Using argument given in Remark 3, @j; = 0. But it is possible that sometimes @7; 
may exceed unity. Thus, we consider the following estimator: 


6; sold ekg dng gers (2.9) 


where 4;; = min[1,@7]. 


In Table 1, we give values of @; for the West non-self-representing area. 


3. ESTIMATION OF THE RELATIVE IMPORTANCE 


Let Ry = Y/Y i= Yij be the preliminary estimator of the relative importance r;; = 
PPT Be Ol EL (A ae 25 eh may 7) id BY SPIN cP OReY (49 Fowl gas rer Re 
by ee R;; =1, j = 1, ..., m), the variance covariance matrix of R; is singular. Thus, the 
method described in Section 2 is not directly applicable to this situation. In order to avoid this 
singularity problem, we delete one item stratum from the expenditure class under consideration. 
Without any loss of generality, let the /th item stratum be deleted. Then apply the procedure 
described in Section 2 to obtain the following estimator for 7;;,, (i = 1, ...,7 -— 1; 
j= 1, .. San) 


yp 


pS — dylRy + dijefé , (3.1) 
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We estimate r;; by a univariate procedure which yields the following estimator of r7;, 
be — a a 


Fry = (1 aa d1;)Ryj aF d,;Rj. 


We obtain the final estimator of r; as 7; = (fj, ..., Fj)’, where 7; = FE/Y = if7;. There 
are J possible choices of deleting one item stratum. We choose the combination which yields 
the smallest average (over item strata) estimated MSE. One may obtain an alternative estimator 
of r;; by subtracting } Oe, r;; from unity. However, according to the procedure, there is a 
positive probability that r,; estimate is negative. 


4. NUMERICAL RESULTS 


In this section, we evaluate various estimators of the cost weight and relative importance 
based on estimated mean squared error. We consider four rival estimators: the preliminary 
estimator, estimator proposed by Cohen and Sommers (1984), the estimator currently used 
by the BLS and the empirical Bayes estimator considered recently by Ghosh and Sohn (1990). 
The Cohen-Sommers estimator of the cost weight (before raking) is given by 


4 CS 
ore 


Y¥;, + c-sd(¥;) if 65S* 


IV 


Yi; + e-sd(Yj;) 


= ¥, —c-sd(¥,) if 65%* = Y, — c-sd(¥j,) 
where 


ACSe _ “CS “CS 
Og RY ae Xp, 


1 20 1 20 
ACS j and ais kee 2 arn (Ke Vad 
a = mint Tl, (1 — NAN yes Y hig Y; ; 
j ( , 55 vf [ob || 


m ny m 
yi") = ye > Wi Xf ay: 1D Wi, 
ei Vial j 


ee ; 
any = {EE 0 — mp}. 


k=] 


c = a Safety factor determined by the BLS (see Table 2). 
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Table 2 
Values of the Safety Factor c for the Major Areas 


Major  NCNS NCSR NENS NESR_—-SSNS_-SSSR_ = WWNS_ WWSR 
Area 1 2 3 4 5 6 qi 8 
C 1.0 5 1.0 5 3.0 is 1.0 5 


NCNS: North Central (Midwest) non-self-representing. 
NCSR: North Central self-representing. 

NENS: North East non-self-representing. 

SSNS: South non-self-representing. 

SSSR: South self-representing. 

WWNS: West non-self-representing. 

WWSR: West self-representing. 


Their estimator for the relative importance is given by 


aCs aCS aCS 
= Vij : if | Fi Fe ij | = c*sd(k;;) 


KS + c-sd(R;;) if Psy = Ry AF c-sd(R;;) 


IR ap c-sd(R;;) if pe <= ek = Cosa.CK;;)), 


where 
FS S91 — dP) Ry Hd PRE, 
m nj I m nj 
Re= YD wif Ded uur 
J=1 [= fil jee We 
dss = ds* if 0 < dis* < 2 
=i MAbinsel esamedly 
Seiler it: Weel, 
1 20 20 
0 ye (RS? — Ri)? — 50 (RO — Rj) (REM — RSS) 
5 CS* k= k=1 
ty 1 20 ’ 
k CS(k)\2 
20 > OR ae a) 


m nj I m 
REM = ys \ mPX | >} 2 WYPX 31, 
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sd( Rj) = ae y (RM? — Rj). 


Since ¥ /_, 7,5 ¥ 1, for our comparison purpose, we have divided rs by Yaa fr 


The arene currently used by the Bureau of Labor Statistics (see United States Department 
of Labor 1988) consists of a number of steps. 


Step 1: Obtain an estimator of the cost weight as follows: 
I 
4 CS(1 2 CS 
6;7 (1) = Vij A Y; 
i=! 


Step 2: Final estimator of 6;; is obtained from 6;; 9S) using a ‘‘raking’’ procedure. The final 
estimator, denoted by Oi BLS” , satisfies the eaionine two conditions: 


y Nog SON xa 


sah 
Step 3: Finally an estimator for the relative importance is obtained as follows: 
I 
- BLS 4 BLS 4 BLS 
Vij a By; | > Gi; : 
i=l 


In our numerical work, we have estimated N; by Le Wii. 


The MSE of an estimator e;; of 6;; is given by: 
MSE = E(e;; — 6;;)” 
= E(e,; — Yj)? — V(¥,j) + 2 Cov(e,;, Yi), 
where it is assumed E(Y;;| 6;;) = 0;;. The above formula is given in Cohen and Sommers 


(1984). As in the Ghosh and Sohn (1990) we estimate the three terms by the balanced repeated 
half samples available from the survey. For example, 


| 


20 
E(@;—~.¥, 52 eS ti SW AED he 
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aS (Yj? - Yi)’, 
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Table 3 
Average Estimated MSE’s for Different Estimators of 6;; 


Average Estimated MSE of 


Major 

Area Yi, gGs are gate 6; 

NCNS .020047 .011549 .009342 .014885 .009428 
(22) (33) (25) (52) 

NCSR .036620 .024783 .016017 .023627 .016155 
(32) (56) (35) (55) 

NENS .018162 .013299 .007327 .013046 .005504 
(26) (59) (28) (69) 

NESR .052883 .051100 .038911 .045610 .028958 
(3) (26) (13) (45) 

SSNS .021757 .013146 .009954 .014415 .006418 
(39) (54) (33) (70) 

SSSR .047500 .028984 .031743 .044238 .009270 
(38) (33) (6) (80) 

WWNS .052387 .029938 .017433 .030069 .010849 
(42) (66) (42) (79) 

WWSR .018223 .033529 .009925 .014898 .005761 
(— 83) (45) (18) (68) 


Note: The figures in the parenthesis represents percent improvement over the preliminary estimator, Y;;. 


2 


1 20 
Covie;,%;; * 30: ee Beh) GY, eae eN) 
In the above e;\*) is the estimator e;; based on the kth half sample (k = 1, ..., 20). For 
example, 
Oy SG IG ear YS, 


0\”) 7 (1 = Gij) yo a ae, wp. 


We obtain bi BLS(K) by the multistep procedure used to obtain bi: BLS where we replace Tea) 

AS by Yi”, ’R, (©) and ASS“) respectively. Note that the above arateiire does not take into 
Beant the eae ate to the estimation of the coefficients (i.e. ,a;;’s) in the composite 
estimators. Cohen and Sommers (1984) recommended the use of half samples of half samples, 
or quarter samples to capture this additional variability. We could not use their procedure since 
our dataset did not contain these quarter samples. 

The data we analyze arise out of 1982-83 Consumer Expenditure Survey (Diary survey). The 
expenditure class we consider is dairy products. There are in all six item strata in this class. 
They are (1) fresh whole milk, (2) other fresh milk and cream, (3) butter, (4) cheese, (5) ice 
cream and related products, and (6) other dairy products. 
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The MSE’s of all the estimators considered are estimated for each publication area and item 
stratum. In Table 3 we report the average estimated MSE’s of the estimators of 6;;, the average 
being taken over all the item strata and all the publication areas within a major area. Notice 
that all the composite estimators except the one proposed by Ghosh and Sohn (1990) are better 
than the preliminary estimator for all the major areas in the average MSE sense. Both 6,5° and 
§,; are better than 62'S. Our proposed estimator 6,; is better than 6,°° in six out of eight 
major areas. In two major areas (NCNS and NCSR), 6;5° is better than 6;;, but the difference 
is very negligible. 

In Tables 4 and 5, we try to demonstrate that the raking procedure may not be necessary. 
In Table 4, the parameter of interest is ¥ /_ ,0; ; the true cost weight for the expenditure class. 
Here, due to the ‘‘raking’’ procedure, ¥ /_ 16 RIS =yi, Y;;. We propose an alternative 
estimator as ¥ /_,6; ; and compare the average estimated MSE (over publication areas in a 


major area) with that of ¥/_,Y,; ;- In all the cases, we gain considerably. 


ip 


Table 4 


Average Estimated MSE’s of Two Estimators of Average Consumer 
Expenditure for the Expenditure Class 


Major Preliminary Proposed Percent 
Area Estimator Estimator Improvement 
NCNS 0.12384 0.07969 36 
NCSR 0.29819 0.13040 56 
NENS 0.21658 0.07602 65 
NESR 0.67486 0.20119 70 
SSNS 0.21506 0.08303 61 
SSSR 0.68415 0.06462 90 
WWNS 0.35446 0.05175 85 
WWSR 0.19292 0.05524 ce. 
Table 5 


Average Estimated MSE’s of Two Estimators of Average Consumer 
Expenditure for the Major Area 


Major Preliminary Proposed Percent 
Area Estimator Estimator Improvement 
NCNS 0.008181 0.0045468 44 
NCSR 0.003672 0.003 1047 15 
NENS 0.006174 0.0029128 53 
NESR 0.011680 0.0056922 51 
SSNS 0.007501 0.0036401 51 
SSSR 0.004434 0.0013751 69 
WWNS 0.008203 0.0022560 72 


WWSR 0.002786 0.0007882 te 
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In Table 5, the parameter of interest is the cost weight of an item stratum for the major 
area. The preliminary estimator (identical to the BLS estimator due to the raking procedure) 
im yaa rei Wid) T ( Yee j4 W;,;). Our estimation procedure can also generate 
estimators at the major area level. We propose the estimator as 6;. = ies vid W, 6; ;/ 
CH2} ¥/2,W;,). The average estimated MSE’s for these two estimators are reported in 
Table 5. Here also our estimator is superior to the preliminary (BLS) estimator. 

The results of Table 4 and 5 suggest that the data consistency step followed by the BLS may 
not be necessary. Indeed, it may be possible to improve the traditional estimators at higher 
levels of aggregation also. 

Table 6 provides the average estimated MSE’s (over all the item strata and publication areas 
in a major area) of various estimators of relative importance. Notice that as in Table 3, all 
the estimators other than /,¢° are better than the preliminary estimator R;; for all the major 
areas. Our proposed estimator fj; is the best among all the estimators considered. 

Recently, Swanson (1992) has compared different methods of estimating cost weights for 
12 of the approximately 70 expenditure classes in the CPI. His investigation shows that overall 
our proposed method is superior to all the rival methods. 


Table 6 
Average Estimated MSE’s for Different Estimators of Relative Importance 


Average Estimated MSE of 


Major 
Area Rij ine ee ime i 

NCNS .0006342 .00046480 .00033143 .00042130 .00018592 
(27) (48) (34) (71) 

NCSR .0009125 .00071967 .00040226 .00044815 .00021309 
(21) (56) (51) (77) 

NENS .0003588 .00026894 .00014146 .0001620 .00011105 
(25) (61) (55) (69) 

NESR .0004264 .00072001 .00028862 .00030555 .00016744 
(-— 69) (32) (28) (61) 

SSNS .0005071 .00033736 .00019352 .00021385 .00011925 
(33) (62) (58) (76) 

SSSR .0006564 .00048569 .00053173 .00053603 .00030979 
(26) (19) (18) (53) 

WWNS .0013709 .00086849 .00051474 .00061901 .00028519 
(37) (62) (55) (79) 

WWSR .0003540 .00070770 .00021384 .00023255 .00013750 
(— 100) (40) (34) (61) 


Note: The figure given in the parenthesis represents percent improvement over R;;. 
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Ordonnée a |’origine 


Zone d’affectation: 

Densité de la population 

Taux de criminalité 

Pourcentage de personnes de 65 ans ou plus 
Pourcentage de personnes de moins de 5 ans 
Taille du ménage 


Indicateurs d’enquéte: 
NCS! 
HIS! 


Expérience des intervieweurs: 
Log (durée des fonctions) 
Log (durée des fonctions)* densité 


Attentes des intervieweurs: 
Confidentialité 
Taux/qualité 

Efficacité 


Comportements des intervieweurs: 
Autorité 

Echange 

Validation sociale 

Pertinence 

Rareté 

Cohérence 

Répertoire 

Adaptation 


fede ajusté 


Tableau 2 2 
Résultats des analyses de régression par MCP des enquétes NCS, HIS, CE taux de réponse au niveau des intervieweurs = 
RY Ss 
Modeéle 1 Modéle 2 Modeéle 3 Modéle 4 a 
7) 
Coefficient Err.-type Coefficient Err.-type Coefficient Err.-type Coefficient —_Err.-type ° 
ne fee ee Fo DAS eS OU Ug oe Se Ss ee a a Se ae Se ee 
£ 
96.94 (3.19) 96.21 (5.39) 94.95 B25) 93.44 (3.35) ® 
o 
—0.00017** (0.000023) -—0.000078* (0.000038) -0.000084* (0.000038) —0.000071 (0.000038) oa 
—0.00024** (0.000055) —0.00021** (0.000055) —0.00023** (0.000056) —0.00022** (0.000056) ) 
— 0.057 (0.051) — 0.054 (0.050) — 0.061 (0.051) —0.061 (0.052) S 
0.41* (0.16) ORT (0.16) 0.29 (0.17) OP 3 (0.17) o 
— 3.20* (1.70) —2.92* (1.24) —2.88* (1.26) — 3.09* (27) oO 
co 
Seize" (0.40) G20) aa (0.40) 6.68** (0.41) 6.55." (0.42) Ke 
520), * (0.46) 526326 (0.46) 5.64** (0.47) 6.05". (0.48) ~ 
02627 * (0.14) Og4ats (0.14) 0.69** (0.15) Oa a (0.15) 
—0.00010** (0.000032) -—0.00011** (0.000032) —0.00011** (0.000032) 
0.61 (0.37) 0.59 (0.37) 
0.046 (0.40) — 0.00073 (0.41) 
Uae! (0.15) 0:53*3 (0.15) 
0. 14** (0.055) 
0.67* (0.29) 
0.18 (0.32) 
—0.19 (0.33) 
— 0.66* (0.29) 
—0.21 (0.29) 
— 0.0068 (0.065) 
— 0.042 (0.054) 
0.3553 0.3640 0.3784 0.3873 
(679) (679) (645) (639) 


(n) 


Je Se ee eee SS DS A SS SS _ SE eS SS eee 


Fra 20 
Fp <0 


1 Les intervieweurs travaillant a l’enquéte CE constituent la catégorie omise. 
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Figure 1. Modéle de participation a une enquéte, Réle de l'intervieweur. 
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